Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add two text files containing pager design notes to the doc/ subfolder. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
ed817fc893e7162ae0ff4022591f7e9e |
User & Date: | drh 2010-05-06 11:55:57.000 |
Context
2010-05-06
| ||
11:56 | Remove the noop-mutex implementations of mutex_held() and mutex_notheld() since they are both unreachable. (check-in: 6767b62a9a user: drh tags: trunk) | |
11:55 | Add two text files containing pager design notes to the doc/ subfolder. (check-in: ed817fc893 user: drh tags: trunk) | |
11:32 | Add test cases to test the libraries handling of corrupt wal-index headers. (check-in: 9465b267d4 user: dan tags: trunk) | |
Changes
Added doc/pager-invariants.txt.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | *** Throughout this document, a page is deemed to have been synced automatically as soon as it is written when PRAGMA synchronous=OFF. Otherwise, the page is not synced until the xSync method of the VFS is called successfully on the file containing the page. *** Definition: A page of the database file is said to be "overwriteable" if one or more of the following are true about the page: (a) The original content of the page as it was at the beginning of the transaction has been written into the rollback journal and synced. (b) The page was a freelist leaf page at the start of the transaction. (c) The page number is greater than the largest page that existed in the database file at the start of the transaction. (1) A page of the database file is never overwritten unless one of the following are true: (a) The page and all other pages on the same sector are overwriteable. (b) The atomic page write optimization is enabled, and the entire transaction other than the update of the transaction sequence number consists of a single page change. (2) The content of a page written into the rollback journal exactly matches both the content in the database when the rollback journal was written and the content in the database at the beginning of the current transaction. (3) Writes to the database file are an integer multiple of the page size in length and are aligned to a page boundary. (4) Reads from the database file are either aligned on a page boundary and an integer multiple of the page size in length or are taken from the first 100 bytes of the database file. (5) All writes to the database file are synced prior to the rollback journal being deleted, truncated, or zeroed. (6) If a master journal file is used, then all writes to the database file are synced prior to the master journal being deleted. *** Definition: Two databases (or the same database at two points it time) are said to be "logically equivalent" if they give the same answer to all queries. Note in particular the the content of freelist leaf pages can be changed arbitarily without effecting the logical equivalence of the database. (7) At any time, if any subset, including the empty set and the total set, of the unsynced changes to a rollback journal are removed and the journal is rolled back, the resulting database file will be logical equivalent to the database file at the beginning of the transaction. (8) When a transaction is rolled back, the xTruncate method of the VFS is called to restore the database file to the same size it was at the beginning of the transaction. (In some VFSes, the xTruncate method is a no-op, but that does not change the fact the SQLite will invoke it.) (9) Whenever the database file is modified, at least one bit in the range of bytes from 24 through 39 inclusive will be changed prior to releasing the EXCLUSIVE lock. (10) The pattern of bits in bytes 24 through 39 shall not repeat in less than one billion transactions. (11) A database file is well-formed at the beginning and at the conclusion of every transaction. (12) An EXCLUSIVE lock must be held on the database file before making any changes to the database file. (13) A SHARED lock must be held on the database file before reading any content out of the database file. |
Added doc/vfs-shm.txt.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | The 5 states of an historical rollback lock as implemented by the xLock, xUnlock, and xCheckReservedLock methods of the sqlite3_io_methods objec are: UNLOCKED SHARED RESERVED PENDING EXCLUSIVE The wal-index file has a similar locking hierarchy implemented using the xShmLock method of the sqlite3_vfs object, but with 7 states. Each connection to a wal-index file must be in one of the following 7 states: UNLOCKED READ READ_FULL WRITE PENDING CHECKPOINT RECOVER These roughly correspond to the 5 states of a rollback lock except that SHARED is split out into 2 states: READ and READ_FULL and there is an extra RECOVER state used for wal-index reconstruction. The meanings of the various wal-index locking states is as follows: UNLOCKED - The wal-index is not in use. READ - Some prefix of the wal-index is being read. Additional wal-index information can be appended at any time. The newly appended content will be ignored by the holder of the READ lock. READ_FULL - The entire wal-index is being read. No new information can be added to the wal-index. The holder of a READ_FULL lock promises never to read pages from the database file that are available anywhere in the wal-index. WRITE - It is OK to append to the wal-index file and to adjust the header to indicate the new "last valid frame". PENDING - Waiting on all READ locks to clear so that a CHECKPOINT lock can be acquired. CHECKPOINT - It is OK to write any WAL data into the database file and zero the last valid frame field of the wal-index header. The wal-index file itself may not be changed other than to zero the last valid frame field in the header. RECOVER - Held during wal-index recovery. Used to prevent a race if multiple clients try to recover a wal-index at the same time. A particular lock manager implementation may coalesce one or more of the wal-index locking states, though with a reduction in concurrency. For example, an implemention might implement only exclusive locking, in which case all states would be equivalent to CHECKPOINT, meaning that only one reader or one writer or one checkpointer could be active at a time. Or, an implementation might combine READ and READ_FULL into a single state equivalent to READ, meaning that a writer could coexist with a reader, but no reader or writers could coexist with a checkpointer. The lock manager must obey the following rules: (1) A READ cannot coexist with CHECKPOINT. (2) A READ_FULL cannot coexist with WRITE. (3) None of WRITE, PENDING, CHECKPOINT, or RECOVER can coexist. The SQLite core will obey the next set of rules. These rules are assertions on the behavior of the SQLite core which might be verified during testing using an instrumented lock manager. (5) No part of the wal-index will be read without holding either some kind of SHM lock or an EXCLUSIVE lock on the original database. (6) A holder of a READ_FULL will never read any page of the database file that is contained anywhere in the wal-index. (7) No part of the wal-index other than the header will be written nor will the size of the wal-index grow without holding a WRITE. (8) The wal-index header will not be written without holding one of WRITE, CHECKPOINT, or RECOVER. (9) A CHECKPOINT or RECOVER must be held in order to reset the last valid frame counter in the header of the wal-index back to zero. (10) A WRITE can only increase the last valid frame pointer in the header. The SQLite core will only ever send requests for UNLOCK, READ, WRITE, CHECKPOINT, or RECOVER to the lock manager. The SQLite core will never request a READ_FULL or PENDING lock though the lock manager may deliver those locking states in response to READ and CHECKPOINT requests, respectively, if and only if the requested READ or CHECKPOINT cannot be delivered. The following are the allowed lock transitions: Original-State Request New-State -------------- ---------- ---------- (11a) UNLOCK READ READ (11b) UNLOCK READ READ_FULL (11c) UNLOCK CHECKPOINT PENDING (11d) UNLOCK CHECKPOINT CHECKPOINT (11e) READ UNLOCK UNLOCK (11f) READ WRITE WRITE (11g) READ RECOVER RECOVER (11h) READ_FULL UNLOCK UNLOCK (11i) READ_FULL WRITE WRITE (11j) READ_FULL RECOVER RECOVER (11k) WRITE READ READ (11l) PENDING UNLOCK UNLOCK (11m) PENDING CHECKPOINT CHECKPOINT (11n) CHECKPOINT UNLOCK UNLOCK (11o) CHECKPOINT RECOVER RECOVER (11p) RECOVER READ READ (11q) RECOVER CHECKPOINT CHECKPOINT These 17 transitions are all that needs to be supported. The lock manager implementation can assert that fact. The other 25 possible transitions among the 7 locking states will never occur. The rules above are sufficient for correctness. For maximum concurrency, the following additional considerations apply: |