/ Check-in [12713f32]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Stricter enforcement of cell sizes when doing balancing operations on the btree, in order to catch file corruption sooner.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256:12713f320b2c1def273dd8b7833dddaaad5331aba779d4b1ec9aa949814f38fe
User & Date: drh 2019-01-23 19:25:59
References
2019-02-01
14:50
Improve the strict enforcement of cell sizes in balancing from check-in [12713f320b2c1def] so that it also works with table-btrees in addition to index-btrees. check-in: ef27e7a0 user: drh tags: trunk
Context
2019-01-23
19:50
Fix a problem with renaming a table within a schema that contains a composite query that uses a column alias as an ORDER BY term. check-in: 2ca6b8f8 user: dan tags: trunk
19:25
Stricter enforcement of cell sizes when doing balancing operations on the btree, in order to catch file corruption sooner. check-in: 12713f32 user: drh tags: trunk
19:17
Fix another fts5 crash that can occur if the database is corrupted. check-in: 44ce8baa user: dan tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to src/btree.c.

  6689   6689         */
  6690   6690         ptrmapPutOvflPtr(pPage, pPage, pCell, pRC);
  6691   6691       }
  6692   6692   #endif
  6693   6693     }
  6694   6694   }
  6695   6695   
         6696  +/*
         6697  +** The following parameters determine how many adjacent pages get involved
         6698  +** in a balancing operation.  NN is the number of neighbors on either side
         6699  +** of the page that participate in the balancing operation.  NB is the
         6700  +** total number of pages that participate, including the target page and
         6701  +** NN neighbors on either side.
         6702  +**
         6703  +** The minimum value of NN is 1 (of course).  Increasing NN above 1
         6704  +** (to 2 or 3) gives a modest improvement in SELECT and DELETE performance
         6705  +** in exchange for a larger degradation in INSERT and UPDATE performance.
         6706  +** The value of NN appears to give the best results overall.
         6707  +**
         6708  +** (Later:) The description above makes it seem as if these values are
         6709  +** tunable - as if you could change them and recompile and it would all work.
         6710  +** But that is unlikely.  NB has been 3 since the inception of SQLite and
         6711  +** we have never tested any other value.
         6712  +*/
         6713  +#define NN 1             /* Number of neighbors on either side of pPage */
         6714  +#define NB 3             /* (NN*2+1): Total pages involved in the balance */
         6715  +
  6696   6716   /*
  6697   6717   ** A CellArray object contains a cache of pointers and sizes for a
  6698   6718   ** consecutive sequence of cells that might be held on multiple pages.
         6719  +**
         6720  +** The cells in this array are the divider cell or cells from the pParent
         6721  +** page plus up to three child pages.  There are a total of nCell cells.
         6722  +**
         6723  +** pRef is a pointer to one of the pages that contributes cells.  This is
         6724  +** used to access information such as MemPage.intKey and MemPage.pBt->pageSize
         6725  +** which should be common to all pages that contribute cells to this array.
         6726  +**
         6727  +** apCell[] and szCell[] hold, respectively, pointers to the start of each
         6728  +** cell and the size of each cell.  Some of the apCell[] pointers might refer
         6729  +** to overflow cells.  In other words, some apCel[] pointers might not point
         6730  +** to content area of the pages.
         6731  +**
         6732  +** A szCell[] of zero means the size of that cell has not yet been computed.
         6733  +**
         6734  +** The cells come from as many as four different pages:
         6735  +**
         6736  +**             -----------
         6737  +**             | Parent  |
         6738  +**             -----------
         6739  +**            /     |     \
         6740  +**           /      |      \
         6741  +**  ---------   ---------   ---------
         6742  +**  |Child-1|   |Child-2|   |Child-3|
         6743  +**  ---------   ---------   ---------
         6744  +**
         6745  +** The order of cells is in the array is:
         6746  +**
         6747  +**       1.  All cells from Child-1 in order
         6748  +**       2.  The first divider cell from Parent
         6749  +**       3.  All cells from Child-2 in order
         6750  +**       4.  The second divider cell from Parent
         6751  +**       5.  All cells from Child-3 in order
         6752  +**
         6753  +** The apEnd[] array holds pointer to the end of page for Child-1, the
         6754  +** Parent, Child-2, the Parent (again), and Child-3.  The ixNx[] array
         6755  +** holds the number of cells contained in each of these 5 stages, and
         6756  +** all stages to the left.  Hence:
         6757  +**    ixNx[0] = Number of cells in Child-1.
         6758  +**    ixNx[1] = Number of cells in Child-1 plus 1 for first divider.
         6759  +**    ixNx[2] = Number of cells in Child-1 and Child-2 + 1 for 1st divider.
         6760  +**    ixNx[3] = Number of cells in Child-1 and Child-2 + both divider cells
         6761  +**    ixNx[4] = Total number of cells.
  6699   6762   */
  6700   6763   typedef struct CellArray CellArray;
  6701   6764   struct CellArray {
  6702   6765     int nCell;              /* Number of cells in apCell[] */
  6703   6766     MemPage *pRef;          /* Reference page */
  6704   6767     u8 **apCell;            /* All cells begin balanced */
  6705   6768     u16 *szCell;            /* Local size of all cells in apCell[] */
         6769  +  u8 *apEnd[NB*2];        /* MemPage.aDataEnd values */
         6770  +  int ixNx[NB*2];         /* Index of at which we move to the next apEnd[] */
  6706   6771   };
  6707   6772   
  6708   6773   /*
  6709   6774   ** Make sure the cell sizes at idx, idx+1, ..., idx+N-1 have been
  6710   6775   ** computed.
  6711   6776   */
  6712   6777   static void populateCellCache(CellArray *p, int idx, int N){
................................................................................
  6749   6814   ** function works around problems caused by this by making a copy of any 
  6750   6815   ** such cells before overwriting the page data.
  6751   6816   **
  6752   6817   ** The MemPage.nFree field is invalidated by this function. It is the 
  6753   6818   ** responsibility of the caller to set it correctly.
  6754   6819   */
  6755   6820   static int rebuildPage(
  6756         -  MemPage *pPg,                   /* Edit this page */
         6821  +  CellArray *pCArray,             /* Content to be added to page pPg */
         6822  +  int iFirst,                     /* First cell in pCArray to use */
  6757   6823     int nCell,                      /* Final number of cells on page */
  6758         -  u8 **apCell,                    /* Array of cells */
  6759         -  u16 *szCell                     /* Array of cell sizes */
         6824  +  MemPage *pPg                    /* The page to be reconstructed */
  6760   6825   ){
  6761   6826     const int hdr = pPg->hdrOffset;          /* Offset of header on pPg */
  6762   6827     u8 * const aData = pPg->aData;           /* Pointer to data for pPg */
  6763   6828     const int usableSize = pPg->pBt->usableSize;
  6764   6829     u8 * const pEnd = &aData[usableSize];
  6765         -  int i;
         6830  +  int i = iFirst;                 /* Which cell to copy from pCArray*/
         6831  +  int j;                          /* Start of cell content area */
         6832  +  int iEnd = i+nCell;             /* Loop terminator */
  6766   6833     u8 *pCellptr = pPg->aCellIdx;
  6767   6834     u8 *pTmp = sqlite3PagerTempSpace(pPg->pBt->pPager);
  6768   6835     u8 *pData;
         6836  +  int k;                          /* Current slot in pCArray->apEnd[] */
         6837  +  u8 *pSrcEnd;                    /* Current pCArray->apEnd[k] value */
  6769   6838   
  6770         -  i = get2byte(&aData[hdr+5]);
  6771         -  memcpy(&pTmp[i], &aData[i], usableSize - i);
         6839  +  assert( i<iEnd );
         6840  +  j = get2byte(&aData[hdr+5]);
         6841  +  memcpy(&pTmp[j], &aData[j], usableSize - j);
         6842  +
         6843  +  for(k=0; pCArray->ixNx[k]<=i && ALWAYS(k<NB*2); k++){}
         6844  +  pSrcEnd = pCArray->apEnd[k];
  6772   6845   
  6773   6846     pData = pEnd;
  6774         -  for(i=0; i<nCell; i++){
  6775         -    u8 *pCell = apCell[i];
         6847  +  while( 1/*exit by break*/ ){
         6848  +    u8 *pCell = pCArray->apCell[i];
         6849  +    u16 sz = pCArray->szCell[i];
         6850  +    assert( sz>0 );
  6776   6851       if( SQLITE_WITHIN(pCell,aData,pEnd) ){
  6777         -      if( ((uptr)(pCell+szCell[i]))>(uptr)pEnd ) return SQLITE_CORRUPT_BKPT;
         6852  +      if( ((uptr)(pCell+sz))>(uptr)pEnd ) return SQLITE_CORRUPT_BKPT;
  6778   6853         pCell = &pTmp[pCell - aData];
         6854  +    }else if( (uptr)(pCell+sz)>(uptr)pSrcEnd
         6855  +           && (uptr)(pCell)<(uptr)pSrcEnd
         6856  +    ){
         6857  +      return SQLITE_CORRUPT_BKPT;
  6779   6858       }
  6780         -    pData -= szCell[i];
         6859  +
         6860  +    pData -= sz;
  6781   6861       put2byte(pCellptr, (pData - aData));
  6782   6862       pCellptr += 2;
  6783   6863       if( pData < pCellptr ) return SQLITE_CORRUPT_BKPT;
  6784         -    memcpy(pData, pCell, szCell[i]);
  6785         -    assert( szCell[i]==pPg->xCellSize(pPg, pCell) || CORRUPT_DB );
  6786         -    testcase( szCell[i]!=pPg->xCellSize(pPg,pCell) );
         6864  +    memcpy(pData, pCell, sz);
         6865  +    assert( sz==pPg->xCellSize(pPg, pCell) || CORRUPT_DB );
         6866  +    testcase( sz!=pPg->xCellSize(pPg,pCell) );
         6867  +    i++;
         6868  +    if( i>=iEnd ) break;
         6869  +    if( pCArray->ixNx[k]<=i ){
         6870  +      k++;
         6871  +      pSrcEnd = pCArray->apEnd[k];
         6872  +    }
  6787   6873     }
  6788   6874   
  6789   6875     /* The pPg->nFree field is now set incorrectly. The caller will fix it. */
  6790   6876     pPg->nCell = nCell;
  6791   6877     pPg->nOverflow = 0;
  6792   6878   
  6793   6879     put2byte(&aData[hdr+1], 0);
................................................................................
  6794   6880     put2byte(&aData[hdr+3], pPg->nCell);
  6795   6881     put2byte(&aData[hdr+5], pData - aData);
  6796   6882     aData[hdr+7] = 0x00;
  6797   6883     return SQLITE_OK;
  6798   6884   }
  6799   6885   
  6800   6886   /*
  6801         -** Array apCell[] contains nCell pointers to b-tree cells. Array szCell
  6802         -** contains the size in bytes of each such cell. This function attempts to 
  6803         -** add the cells stored in the array to page pPg. If it cannot (because 
  6804         -** the page needs to be defragmented before the cells will fit), non-zero
  6805         -** is returned. Otherwise, if the cells are added successfully, zero is
  6806         -** returned.
         6887  +** The pCArray objects contains pointers to b-tree cells and the cell sizes.
         6888  +** This function attempts to add the cells stored in the array to page pPg.
         6889  +** If it cannot (because the page needs to be defragmented before the cells
         6890  +** will fit), non-zero is returned. Otherwise, if the cells are added
         6891  +** successfully, zero is returned.
  6807   6892   **
  6808   6893   ** Argument pCellptr points to the first entry in the cell-pointer array
  6809   6894   ** (part of page pPg) to populate. After cell apCell[0] is written to the
  6810   6895   ** page body, a 16-bit offset is written to pCellptr. And so on, for each
  6811   6896   ** cell in the array. It is the responsibility of the caller to ensure
  6812   6897   ** that it is safe to overwrite this part of the cell-pointer array.
  6813   6898   **
................................................................................
  6821   6906   ** all cells - not just those inserted by the current call). If the content
  6822   6907   ** area must be extended to before this point in order to accomodate all
  6823   6908   ** cells in apCell[], then the cells do not fit and non-zero is returned.
  6824   6909   */
  6825   6910   static int pageInsertArray(
  6826   6911     MemPage *pPg,                   /* Page to add cells to */
  6827   6912     u8 *pBegin,                     /* End of cell-pointer array */
  6828         -  u8 **ppData,                    /* IN/OUT: Page content -area pointer */
         6913  +  u8 **ppData,                    /* IN/OUT: Page content-area pointer */
  6829   6914     u8 *pCellptr,                   /* Pointer to cell-pointer area */
  6830   6915     int iFirst,                     /* Index of first cell to add */
  6831   6916     int nCell,                      /* Number of cells to add to pPg */
  6832   6917     CellArray *pCArray              /* Array of cells */
  6833   6918   ){
  6834         -  int i;
  6835         -  u8 *aData = pPg->aData;
  6836         -  u8 *pData = *ppData;
  6837         -  int iEnd = iFirst + nCell;
         6919  +  int i = iFirst;                 /* Loop counter - cell index to insert */
         6920  +  u8 *aData = pPg->aData;         /* Complete page */
         6921  +  u8 *pData = *ppData;            /* Content area.  A subset of aData[] */
         6922  +  int iEnd = iFirst + nCell;      /* End of loop. One past last cell to ins */
         6923  +  int k;                          /* Current slot in pCArray->apEnd[] */
         6924  +  u8 *pEnd;                       /* Maximum extent of cell data */
  6838   6925     assert( CORRUPT_DB || pPg->hdrOffset==0 );    /* Never called on page 1 */
  6839         -  for(i=iFirst; i<iEnd; i++){
         6926  +  if( iEnd<=iFirst ) return 0;
         6927  +  for(k=0; pCArray->ixNx[k]<=i && ALWAYS(k<NB*2); k++){}
         6928  +  pEnd = pCArray->apEnd[k];
         6929  +  while( 1 /*Exit by break*/ ){
  6840   6930       int sz, rc;
  6841   6931       u8 *pSlot;
  6842   6932       sz = cachedCellSize(pCArray, i);
  6843   6933       if( (aData[1]==0 && aData[2]==0) || (pSlot = pageFindSlot(pPg,sz,&rc))==0 ){
  6844   6934         if( (pData - pBegin)<sz ) return 1;
  6845   6935         pData -= sz;
  6846   6936         pSlot = pData;
................................................................................
  6847   6937       }
  6848   6938       /* pSlot and pCArray->apCell[i] will never overlap on a well-formed
  6849   6939       ** database.  But they might for a corrupt database.  Hence use memmove()
  6850   6940       ** since memcpy() sends SIGABORT with overlapping buffers on OpenBSD */
  6851   6941       assert( (pSlot+sz)<=pCArray->apCell[i]
  6852   6942            || pSlot>=(pCArray->apCell[i]+sz)
  6853   6943            || CORRUPT_DB );
         6944  +    if( (uptr)(pCArray->apCell[i]+sz)>(uptr)pEnd
         6945  +     && (uptr)(pCArray->apCell[i])<(uptr)pEnd
         6946  +    ){
         6947  +      assert( CORRUPT_DB );
         6948  +      (void)SQLITE_CORRUPT_BKPT;
         6949  +      return 1;
         6950  +    }
  6854   6951       memmove(pSlot, pCArray->apCell[i], sz);
  6855   6952       put2byte(pCellptr, (pSlot - aData));
  6856   6953       pCellptr += 2;
         6954  +    i++;
         6955  +    if( i>=iEnd ) break;
         6956  +    if( pCArray->ixNx[k]<=i ){
         6957  +      k++;
         6958  +      pEnd = pCArray->apEnd[k];
         6959  +    }
  6857   6960     }
  6858   6961     *ppData = pData;
  6859   6962     return 0;
  6860   6963   }
  6861   6964   
  6862   6965   /*
  6863         -** Array apCell[] contains nCell pointers to b-tree cells. Array szCell 
  6864         -** contains the size in bytes of each such cell. This function adds the
  6865         -** space associated with each cell in the array that is currently stored 
  6866         -** within the body of pPg to the pPg free-list. The cell-pointers and other
  6867         -** fields of the page are not updated.
         6966  +** The pCArray object contains pointers to b-tree cells and their sizes.
         6967  +**
         6968  +** This function adds the space associated with each cell in the array
         6969  +** that is currently stored within the body of pPg to the pPg free-list.
         6970  +** The cell-pointers and other fields of the page are not updated.
  6868   6971   **
  6869   6972   ** This function returns the total number of cells added to the free-list.
  6870   6973   */
  6871   6974   static int pageFreeArray(
  6872   6975     MemPage *pPg,                   /* Page to edit */
  6873   6976     int iFirst,                     /* First cell to delete */
  6874   6977     int nCell,                      /* Cells to delete */
................................................................................
  6910   7013       assert( pFree>aData && (pFree - aData)<65536 );
  6911   7014       freeSpace(pPg, (u16)(pFree - aData), szFree);
  6912   7015     }
  6913   7016     return nRet;
  6914   7017   }
  6915   7018   
  6916   7019   /*
  6917         -** apCell[] and szCell[] contains pointers to and sizes of all cells in the
  6918         -** pages being balanced.  The current page, pPg, has pPg->nCell cells starting
  6919         -** with apCell[iOld].  After balancing, this page should hold nNew cells
         7020  +** pCArray contains pointers to and sizes of all cells in the pages being
         7021  +** balanced.  The current page, pPg, has pPg->nCell cells starting with
         7022  +** pCArray->apCell[iOld].  After balancing, this page should hold nNew cells
  6920   7023   ** starting at apCell[iNew].
  6921   7024   **
  6922   7025   ** This routine makes the necessary adjustments to pPg so that it contains
  6923   7026   ** the correct cells after being balanced.
  6924   7027   **
  6925   7028   ** The pPg->nFree field is invalid when this function returns. It is the
  6926   7029   ** responsibility of the caller to set it correctly.
................................................................................
  7012   7115     }
  7013   7116   #endif
  7014   7117   
  7015   7118     return SQLITE_OK;
  7016   7119    editpage_fail:
  7017   7120     /* Unable to edit this page. Rebuild it from scratch instead. */
  7018   7121     populateCellCache(pCArray, iNew, nNew);
  7019         -  return rebuildPage(pPg, nNew, &pCArray->apCell[iNew], &pCArray->szCell[iNew]);
         7122  +  return rebuildPage(pCArray, iNew, nNew, pPg);
  7020   7123   }
  7021   7124   
  7022         -/*
  7023         -** The following parameters determine how many adjacent pages get involved
  7024         -** in a balancing operation.  NN is the number of neighbors on either side
  7025         -** of the page that participate in the balancing operation.  NB is the
  7026         -** total number of pages that participate, including the target page and
  7027         -** NN neighbors on either side.
  7028         -**
  7029         -** The minimum value of NN is 1 (of course).  Increasing NN above 1
  7030         -** (to 2 or 3) gives a modest improvement in SELECT and DELETE performance
  7031         -** in exchange for a larger degradation in INSERT and UPDATE performance.
  7032         -** The value of NN appears to give the best results overall.
  7033         -*/
  7034         -#define NN 1             /* Number of neighbors on either side of pPage */
  7035         -#define NB (NN*2+1)      /* Total pages involved in the balance */
  7036         -
  7037   7125   
  7038   7126   #ifndef SQLITE_OMIT_QUICKBALANCE
  7039   7127   /*
  7040   7128   ** This version of balance() handles the common special case where
  7041   7129   ** a new entry is being inserted on the extreme right-end of the
  7042   7130   ** tree, in other words, when the new entry will become the largest
  7043   7131   ** entry in the tree.
................................................................................
  7079   7167   
  7080   7168     if( rc==SQLITE_OK ){
  7081   7169   
  7082   7170       u8 *pOut = &pSpace[4];
  7083   7171       u8 *pCell = pPage->apOvfl[0];
  7084   7172       u16 szCell = pPage->xCellSize(pPage, pCell);
  7085   7173       u8 *pStop;
         7174  +    CellArray b;
  7086   7175   
  7087   7176       assert( sqlite3PagerIswriteable(pNew->pDbPage) );
  7088   7177       assert( pPage->aData[0]==(PTF_INTKEY|PTF_LEAFDATA|PTF_LEAF) );
  7089   7178       zeroPage(pNew, PTF_INTKEY|PTF_LEAFDATA|PTF_LEAF);
  7090         -    rc = rebuildPage(pNew, 1, &pCell, &szCell);
  7091         -    if( NEVER(rc) ) return rc;
         7179  +    b.nCell = 1;
         7180  +    b.pRef = pPage;
         7181  +    b.apCell = &pCell;
         7182  +    b.szCell = &szCell;
         7183  +    b.apEnd[0] = pPage->aDataEnd;
         7184  +    b.ixNx[0] = 2;
         7185  +    rc = rebuildPage(&b, 0, 1, pNew);
         7186  +    if( NEVER(rc) ){
         7187  +      releasePage(pNew);
         7188  +      return rc;
         7189  +    }
  7092   7190       pNew->nFree = pBt->usableSize - pNew->cellOffset - 2 - szCell;
  7093   7191   
  7094   7192       /* If this is an auto-vacuum database, update the pointer map
  7095   7193       ** with entries for the new page, and any pointer from the 
  7096   7194       ** cell on the page to an overflow page. If either of these
  7097   7195       ** operations fails, the return code is set, but the contents
  7098   7196       ** of the parent page are still manipulated by thh code below.
................................................................................
  7564   7662     **              the right of the i-th sibling page.
  7565   7663     ** usableSpace: Number of bytes of space available on each sibling.
  7566   7664     ** 
  7567   7665     */
  7568   7666     usableSpace = pBt->usableSize - 12 + leafCorrection;
  7569   7667     for(i=0; i<nOld; i++){
  7570   7668       MemPage *p = apOld[i];
         7669  +    b.apEnd[i*2] = p->aDataEnd;
         7670  +    b.apEnd[i*2+1] = pParent->aDataEnd;
         7671  +    b.ixNx[i*2] = cntOld[i];
         7672  +    b.ixNx[i*2+1] = cntOld[i]+1;
  7571   7673       szNew[i] = usableSpace - p->nFree;
  7572   7674       for(j=0; j<p->nOverflow; j++){
  7573   7675         szNew[i] += 2 + p->xCellSize(p, p->apOvfl[j]);
  7574   7676       }
  7575   7677       cntNew[i] = cntOld[i];
  7576   7678     }
  7577   7679     k = nOld;