PostgreSQL
Check-in [d250ce3c3c]
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update README, we don't do post-recovery cleanup actions anymore. transam/README explained how B-tree incomplete splits were tracked and fixed after recovery, as an example of handling complex actions that need multiple WAL records, but that's not how it works anymore. Explain the new paradigm.
Timelines: family | ancestors | trunk | WIN32_DEV | REL9_0_ALPHA4_BRANCH
Files: files | file ages | folders
SHA1:d250ce3c3c3586bcedf60c690cac6682f253ccd5
User & Date: heikki.linnakangas@iki.fi 2014-05-17 10:55:03
Context
2014-05-17
10:55
Update README, we don't do post-recovery cleanup actions anymore. transam/README explained how B-tree incomplete splits were tracked and fixed after recovery, as an example of handling complex action... Leaf check-in: d250ce3c3c user: heikki.linnakangas@iki.fi tags: trunk, WIN32_DEV, REL9_0_ALPHA4_BRANCH
2014-05-16
20:51
Make sure chr(int) can't create invalid UTF8 sequences. Several years ago we changed chr(int) so that if the database encoding is UTF8, it would interpret its argument as a Unicode code point and exp... check-in: ee787097fc user: tgl@sss.pgh.pa.us tags: trunk, WIN32_DEV, REL9_0_ALPHA4_BRANCH
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to src/backend/access/transam/README.

571
572
573
574
575
576
577
578
579
580

581
582
583
584
585
586
587





588
589
590
591
592
593
594
problems. All other processes must only call PageSet/GetLSN when holding
either an exclusive buffer lock or a shared lock plus buffer header lock,
or be writing the data block directly rather than through shared buffers
while holding AccessExclusiveLock on the relation.

Due to all these constraints, complex changes (such as a multilevel index
insertion) normally need to be described by a series of atomic-action WAL
records.  What do you do if the intermediate states are not self-consistent?
The answer is that the WAL replay logic has to be able to fix things up.
In btree indexes, for example, a page split requires insertion of a new key in

the parent btree level, but for locking reasons this has to be reflected by
two separate WAL records.  The replay code has to remember "unfinished" split
operations, and match them up to subsequent insertions in the parent level.
If no matching insert has been found by the time the WAL replay ends, the
replay code has to do the insertion on its own to restore the index to
consistency.  Such insertions occur after WAL is operational, so they can
and should write WAL records for the additional generated actions.






Writing Hints
-------------

In some cases, we write additional information to data blocks without
writing a preceding WAL record. This should only happen iff the data can
be reconstructed later following a crash and the action is simply a way







|
|
|
>
|
<
|
|
|
|
|
>
>
>
>
>







571
572
573
574
575
576
577
578
579
580
581
582

583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
problems. All other processes must only call PageSet/GetLSN when holding
either an exclusive buffer lock or a shared lock plus buffer header lock,
or be writing the data block directly rather than through shared buffers
while holding AccessExclusiveLock on the relation.

Due to all these constraints, complex changes (such as a multilevel index
insertion) normally need to be described by a series of atomic-action WAL
records. The intermediate states must be self-consistent, so that if the
replay is interrupted between any two actions, the system is fully
functional. In btree indexes, for example, a page split requires a new page
to be allocated, and an insertion of a new key in the parent btree level,
but for locking reasons this has to be reflected by two separate WAL

records. Replaying the first record, to allocate the new page and move
tuples to it, sets a flag on the page to indicate that the key has not been
inserted to the parent yet. Replaying the second record clears the flag.
This intermediate state is never seen by other backends during normal
operation, because the lock on the child page is held across the two
actions, but will be seen if the operation is interrupted before writing
the second WAL record. The search algorithm works with the intermediate
state as normal, but if an insertion encounters a page with the
incomplete-split flag set, it will finish the interrupted split by
inserting the key to the parent, before proceeding.

Writing Hints
-------------

In some cases, we write additional information to data blocks without
writing a preceding WAL record. This should only happen iff the data can
be reconstructed later following a crash and the action is simply a way