Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Store page numbers in database snapshots as 64-bit integers. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | compression-hooks |
Files: | files | file ages | folders |
SHA1: |
53de55a6f4a3933ceafe329b25288170 |
User & Date: | dan 2012-10-26 16:42:33.930 |
Context
2012-10-26
| ||
17:09 | Fix a problem with snapshot initialization. check-in: 8ce567e8be user: dan tags: compression-hooks | |
16:42 | Store page numbers in database snapshots as 64-bit integers. check-in: 53de55a6f4 user: dan tags: compression-hooks | |
2012-10-25
| ||
11:08 | Fix bug reading page data from a compressed database that occurs when the last page of a segment ends on the last byte of a block. check-in: 549868a020 user: dan tags: compression-hooks | |
Changes
Changes to src/lsmInt.h.
︙ | ︙ | |||
139 140 141 142 143 144 145 | #define LSM_LOCK_READER(i) ((i) + LSM_LOCK_CHECKPOINTER + 1) /* ** Hard limit on the number of free-list entries that may be stored in ** a checkpoint (the remainder are stored as a system record in the LSM). ** See also LSM_CONFIG_MAX_FREELIST. */ | | | 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | #define LSM_LOCK_READER(i) ((i) + LSM_LOCK_CHECKPOINTER + 1) /* ** Hard limit on the number of free-list entries that may be stored in ** a checkpoint (the remainder are stored as a system record in the LSM). ** See also LSM_CONFIG_MAX_FREELIST. */ #define LSM_MAX_FREELIST_ENTRIES 24 #define LSM_ATTEMPTS_BEFORE_PROTOCOL 10000 /* ** Each entry stored in the LSM (or in-memory tree structure) has an ** associated mask of the following flags. |
︙ | ︙ | |||
478 479 480 481 482 483 484 | Database *pDatabase; /* Database this snapshot belongs to */ Level *pLevel; /* Pointer to level 0 of snapshot (or NULL) */ i64 iId; /* Snapshot id */ i64 iLogOff; /* Log file offset */ /* Used by worker snapshots only */ int nBlock; /* Number of blocks in database file */ | | | 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 | Database *pDatabase; /* Database this snapshot belongs to */ Level *pLevel; /* Pointer to level 0 of snapshot (or NULL) */ i64 iId; /* Snapshot id */ i64 iLogOff; /* Log file offset */ /* Used by worker snapshots only */ int nBlock; /* Number of blocks in database file */ Pgno aiAppend[LSM_APPLIST_SZ]; /* Append point list */ Freelist freelist; /* Free block list */ int nFreelistOvfl; /* Number of extra free-list entries in LSM */ u32 nWrite; /* Total number of pages written to disk */ }; #define LSM_INITIAL_SNAPSHOT_ID 11 /* |
︙ | ︙ |
Changes to src/lsm_ckpt.c.
︙ | ︙ | |||
51 52 53 54 55 56 57 | ** ** 4 integers. See ckptExportAppendlist(). ** ** For each level in the database, a level record. Formatted as follows: ** ** 0. Age of the level. ** 1. The number of right-hand segments (nRight, possibly 0), | | | | > | | | > | 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | ** ** 4 integers. See ckptExportAppendlist(). ** ** For each level in the database, a level record. Formatted as follows: ** ** 0. Age of the level. ** 1. The number of right-hand segments (nRight, possibly 0), ** 2. Segment record for left-hand segment (8 integers defined below), ** 3. Segment record for each right-hand segment (8 integers defined below), ** 4. If nRight>0, The number of segments involved in the merge ** 5. if nRight>0, Current nSkip value (see Merge structure defn.), ** 6. For each segment in the merge: ** 5a. Page number of next cell to read during merge (this field ** is 64-bits - 2 integers) ** 5b. Cell number of next cell to read during merge ** 7. Page containing current split-key (64-bits - 2 integers). ** 8. Cell within page containing current split-key. ** 9. Current pointer value (64-bits - 2 integers). ** ** The freelist. ** ** 1. Number of free-list entries stored in checkpoint header. ** 2. For each entry: ** 2a. Block number of free block. ** 2b. MSW of associated checkpoint id. ** 2c. LSW of associated checkpoint id. ** ** If the overflow flag is set, then extra free-list entries may be stored ** in the FREELIST record. The FREELIST record contains 3 32-bit integers ** per entry, in the same format as above (without the "number of entries" ** field). ** ** The checksum: ** ** 1. Checksum value 1. ** 2. Checksum value 2. ** ** In the above, a segment record consists of the following four 64-bit ** fields (converted to 2 * u32 by storing the MSW followed by LSW): ** ** 1. First page of array, ** 2. Last page of array, ** 3. Root page of array (or 0), ** 4. Size of array in pages. */ |
︙ | ︙ | |||
103 104 105 106 107 108 109 | ** follows: ** ** * For each level in the database not undergoing a merge, add 1. ** ** * For each level in the database that is undergoing a merge, add ** the number of segments on the rhs of the level. ** | | | | | > > > > > > | 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | ** follows: ** ** * For each level in the database not undergoing a merge, add 1. ** ** * For each level in the database that is undergoing a merge, add ** the number of segments on the rhs of the level. ** ** A level record not undergoing a merge is 10 integers. A level record ** with nRhs rhs segments and (nRhs+1) input segments (i.e. including the ** separators from the next level) is (11*nRhs+20) integers. The maximum ** per right-hand-side level is therefore 21 integers. So the maximum ** size of all level records in a checkpoint is 21*40=820 integers. ** ** TODO: Before pointer values were changed from 32 to 64 bits, the above ** used to come to 420 bytes - leaving significant space for a free-list ** prefix. No more. To fix this, reduce the size of the level records in ** a db snapshot, and improve management of the free-list tail in ** lsm_sorted.c. */ #define LSM_MAX_RHS_SEGMENTS 40 /* ** LARGE NUMBERS OF FREELIST ENTRIES: ** ** There is also a limit (LSM_MAX_FREELIST_ENTRIES - defined in lsmInt.h) |
︙ | ︙ | |||
157 158 159 160 161 162 163 | static const int one = 1; #define LSM_LITTLE_ENDIAN (*(u8 *)(&one)) /* Sizes, in integers, of various parts of the checkpoint. */ #define CKPT_HDR_SIZE 9 #define CKPT_LOGPTR_SIZE 4 | < < | | 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | static const int one = 1; #define LSM_LITTLE_ENDIAN (*(u8 *)(&one)) /* Sizes, in integers, of various parts of the checkpoint. */ #define CKPT_HDR_SIZE 9 #define CKPT_LOGPTR_SIZE 4 #define CKPT_APPENDLIST_SIZE (LSM_APPLIST_SZ * 2) /* A #define to describe each integer in the checkpoint header. */ #define CKPT_HDR_ID_MSW 0 #define CKPT_HDR_ID_LSW 1 #define CKPT_HDR_NCKPT 2 #define CKPT_HDR_NBLOCK 3 #define CKPT_HDR_BLKSZ 4 |
︙ | ︙ | |||
257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 | if( *pRc==LSM_OK ){ u32 aCksum[2] = {0, 0}; ckptChecksum(p->aCkpt, nCkpt+2, &aCksum[0], &aCksum[1]); ckptSetValue(p, nCkpt, aCksum[0], pRc); ckptSetValue(p, nCkpt+1, aCksum[1], pRc); } } /* ** Append a 6-value segment record corresponding to pSeg to the checkpoint ** buffer passed as the third argument. */ static void ckptExportSegment( Segment *pSeg, CkptBuffer *p, int *piOut, int *pRc ){ | > > > > > > > > > > > > > > > > > > < < | | | | < < | 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 | if( *pRc==LSM_OK ){ u32 aCksum[2] = {0, 0}; ckptChecksum(p->aCkpt, nCkpt+2, &aCksum[0], &aCksum[1]); ckptSetValue(p, nCkpt, aCksum[0], pRc); ckptSetValue(p, nCkpt+1, aCksum[1], pRc); } } static void ckptAppend64(CkptBuffer *p, int *piOut, i64 iVal, int *pRc){ int iOut = *piOut; ckptSetValue(p, iOut++, (iVal >> 32) & 0xFFFFFFFF, pRc); ckptSetValue(p, iOut++, (iVal & 0xFFFFFFFF), pRc); *piOut = iOut; } static i64 ckptRead64(u32 *a){ return (((i64)a[0]) << 32) + (i64)a[1]; } static i64 ckptGobble64(u32 *a, int *piIn){ int iIn = *piIn; *piIn += 2; return ckptRead64(&a[iIn]); } /* ** Append a 6-value segment record corresponding to pSeg to the checkpoint ** buffer passed as the third argument. */ static void ckptExportSegment( Segment *pSeg, CkptBuffer *p, int *piOut, int *pRc ){ ckptAppend64(p, piOut, pSeg->iFirst, pRc); ckptAppend64(p, piOut, pSeg->iLastPg, pRc); ckptAppend64(p, piOut, pSeg->iRoot, pRc); ckptAppend64(p, piOut, pSeg->nSize, pRc); } static void ckptExportLevel( Level *pLevel, /* Level object to serialize */ CkptBuffer *p, /* Append new level record to this ckpt */ int *piOut, /* IN/OUT: Size of checkpoint so far */ int *pRc /* IN/OUT: Error code */ |
︙ | ︙ | |||
304 305 306 307 308 309 310 | } assert( pMerge->nInput==pLevel->nRight || pMerge->nInput==pLevel->nRight+1 ); ckptSetValue(p, iOut++, pMerge->nInput, pRc); ckptSetValue(p, iOut++, pMerge->nSkip, pRc); for(i=0; i<pMerge->nInput; i++){ | | | | | 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 | } assert( pMerge->nInput==pLevel->nRight || pMerge->nInput==pLevel->nRight+1 ); ckptSetValue(p, iOut++, pMerge->nInput, pRc); ckptSetValue(p, iOut++, pMerge->nSkip, pRc); for(i=0; i<pMerge->nInput; i++){ ckptAppend64(p, &iOut, pMerge->aInput[i].iPg, pRc); ckptSetValue(p, iOut++, pMerge->aInput[i].iCell, pRc); } ckptAppend64(p, &iOut, pMerge->splitkey.iPg, pRc); ckptSetValue(p, iOut++, pMerge->splitkey.iCell, pRc); ckptAppend64(p, &iOut, pMerge->iCurrentPtr, pRc); } *piOut = iOut; } /* ** Populate the log offset fields of the checkpoint buffer. 4 values. |
︙ | ︙ | |||
331 332 333 334 335 336 337 | ){ int iOut = *piOut; assert( iOut==CKPT_HDR_LO_MSW ); if( bFlush ){ i64 iOff = pDb->treehdr.iOldLog; | < | < | | | < | 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 | ){ int iOut = *piOut; assert( iOut==CKPT_HDR_LO_MSW ); if( bFlush ){ i64 iOff = pDb->treehdr.iOldLog; ckptAppend64(p, &iOut, iOff, pRc); ckptSetValue(p, iOut++, pDb->treehdr.oldcksum0, pRc); ckptSetValue(p, iOut++, pDb->treehdr.oldcksum1, pRc); }else{ for(; iOut<=CKPT_HDR_LO_CKSUM2; iOut++){ ckptSetValue(p, iOut, pDb->pShmhdr->aSnap2[iOut], pRc); } } *piOut = iOut; } static void ckptExportAppendlist( lsm_db *db, /* Database connection */ CkptBuffer *p, /* Checkpoint buffer to write to */ int *piOut, /* IN/OUT: Offset within checkpoint buffer */ int *pRc /* IN/OUT: Error code */ ){ int i; Pgno *aiAppend = db->pWorker->aiAppend; for(i=0; i<LSM_APPLIST_SZ; i++){ ckptAppend64(p, piOut, aiAppend[i], pRc); } }; static int ckptExportSnapshot( lsm_db *pDb, /* Connection handle */ int nOvfl, /* Number of free-list entries in LSM */ int bLog, /* True to update log-offset fields */ i64 iId, /* Checkpoint id */ |
︙ | ︙ | |||
461 462 463 464 465 466 467 | ** Helper function for ckptImport(). */ static void ckptNewSegment( u32 *aIn, int *piIn, Segment *pSegment /* Populate this structure */ ){ | < < | | | | | < | 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 | ** Helper function for ckptImport(). */ static void ckptNewSegment( u32 *aIn, int *piIn, Segment *pSegment /* Populate this structure */ ){ assert( pSegment->iFirst==0 && pSegment->iLastPg==0 ); assert( pSegment->nSize==0 && pSegment->iRoot==0 ); pSegment->iFirst = ckptGobble64(aIn, piIn); pSegment->iLastPg = ckptGobble64(aIn, piIn); pSegment->iRoot = ckptGobble64(aIn, piIn); pSegment->nSize = ckptGobble64(aIn, piIn); assert( pSegment->iFirst ); } static int ckptSetupMerge(lsm_db *pDb, u32 *aInt, int *piIn, Level *pLevel){ Merge *pMerge; /* Allocated Merge object */ int nInput; /* Number of input segments in merge */ int iIn = *piIn; /* Next value to read from aInt[] */ int i; /* Iterator variable */ |
︙ | ︙ | |||
493 494 495 496 497 498 499 | /* Populate the Merge object. */ pMerge->aInput = (MergeInput *)&pMerge[1]; pMerge->nInput = nInput; pMerge->iOutputOff = -1; pMerge->nSkip = (int)aInt[iIn++]; for(i=0; i<nInput; i++){ | | | | | 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 | /* Populate the Merge object. */ pMerge->aInput = (MergeInput *)&pMerge[1]; pMerge->nInput = nInput; pMerge->iOutputOff = -1; pMerge->nSkip = (int)aInt[iIn++]; for(i=0; i<nInput; i++){ pMerge->aInput[i].iPg = ckptGobble64(aInt, &iIn); pMerge->aInput[i].iCell = (int)aInt[iIn++]; } pMerge->splitkey.iPg = ckptGobble64(aInt, &iIn); pMerge->splitkey.iCell = (int)aInt[iIn++]; pMerge->iCurrentPtr = ckptGobble64(aInt, &iIn); /* Set *piIn and return LSM_OK. */ *piIn = iIn; return LSM_OK; } |
︙ | ︙ | |||
1046 1047 1048 1049 1050 1051 1052 | ){ int rc = LSM_OK; Snapshot *pNew; pNew = (Snapshot *)lsmMallocZeroRc(pDb->pEnv, sizeof(Snapshot), &rc); if( rc==LSM_OK ){ int nFree; | | | | > > | 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 | ){ int rc = LSM_OK; Snapshot *pNew; pNew = (Snapshot *)lsmMallocZeroRc(pDb->pEnv, sizeof(Snapshot), &rc); if( rc==LSM_OK ){ int nFree; int i; int nLevel = (int)aCkpt[CKPT_HDR_NLEVEL]; int iIn = CKPT_HDR_SIZE + CKPT_APPENDLIST_SIZE + CKPT_LOGPTR_SIZE; pNew->iId = lsmCheckpointId(aCkpt, 0); pNew->nBlock = aCkpt[CKPT_HDR_NBLOCK]; pNew->nWrite = aCkpt[CKPT_HDR_NWRITE]; rc = ckptLoadLevels(pDb, aCkpt, &iIn, nLevel, &pNew->pLevel); pNew->iLogOff = lsmCheckpointLogOffset(aCkpt); /* Make a copy of the append-list */ for(i=0; i<LSM_APPLIST_SZ; i++){ u32 *a = &aCkpt[CKPT_HDR_SIZE + CKPT_LOGPTR_SIZE + i*2]; pNew->aiAppend[i] = ckptRead64(a); } /* Copy the free-list */ if( bInclFreelist ){ pNew->nFreelistOvfl = aCkpt[CKPT_HDR_OVFL]; nFree = aCkpt[iIn++]; if( nFree ){ pNew->freelist.aEntry = (FreelistEntry *)lsmMallocZeroRc( |
︙ | ︙ |
Changes to src/lsm_file.c.
︙ | ︙ | |||
1179 1180 1181 1182 1183 1184 1185 | int rc = LSM_OK; /* Return code */ int iFirst; /* First page on block iBlk */ int iLast; /* Last page on block iBlk */ Level *pLevel; /* Used to iterate through levels */ int iIn; /* Used to iterate through append points */ int iOut = 0; /* Used to output append points */ | | | 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 | int rc = LSM_OK; /* Return code */ int iFirst; /* First page on block iBlk */ int iLast; /* Last page on block iBlk */ Level *pLevel; /* Used to iterate through levels */ int iIn; /* Used to iterate through append points */ int iOut = 0; /* Used to output append points */ Pgno *aApp = pSnapshot->aiAppend; iFirst = fsFirstPageOnBlock(pFS, iBlk); iLast = fsLastPageOnBlock(pFS, iBlk); /* Check if any other run in the snapshot has a start or end page ** within this block. If there is such a run, return early. */ for(pLevel=lsmDbSnapshotLevel(pSnapshot); pLevel; pLevel=pLevel->pNext){ |
︙ | ︙ | |||
1391 1392 1393 1394 1395 1396 1397 | } return fsPageGet(pFS, iPg, 0, ppNext); } static Pgno findAppendPoint(FileSystem *pFS){ int i; | | | 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 | } return fsPageGet(pFS, iPg, 0, ppNext); } static Pgno findAppendPoint(FileSystem *pFS){ int i; Pgno *aiAppend = pFS->pDb->pWorker->aiAppend; u32 iRet = 0; for(i=LSM_APPLIST_SZ-1; iRet==0 && i>=0; i--){ if( (iRet = aiAppend[i]) ) aiAppend[i] = 0; } return iRet; } |
︙ | ︙ | |||
1504 1505 1506 1507 1508 1509 1510 | ** ** Otherwise, add the first free page in the last block used by the run ** to the lAppend list. */ iBlk = fsPageToBlock(pFS, p->iLastPg); if( fsLastPageOnBlock(pFS, fsPageToBlock(pFS, p->iLastPg) )!=p->iLastPg ){ int i; | | | 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 | ** ** Otherwise, add the first free page in the last block used by the run ** to the lAppend list. */ iBlk = fsPageToBlock(pFS, p->iLastPg); if( fsLastPageOnBlock(pFS, fsPageToBlock(pFS, p->iLastPg) )!=p->iLastPg ){ int i; Pgno *aiAppend = pFS->pDb->pWorker->aiAppend; for(i=0; i<LSM_APPLIST_SZ; i++){ if( aiAppend[i]==0 ){ aiAppend[i] = p->iLastPg+1; break; } } }else if( pFS->pCompress==0 ){ |
︙ | ︙ |