/ Check-in [66b3ad09]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Refactoring the btree and pager routines into distinct two-phase commit routines. We've always done a two-phase commit - this change is just making that more apparent in the code. (CVS 3762)
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 66b3ad09ea657d25d48cb75ec2671ea2dc1b6005
User & Date: drh 2007-03-30 14:06:34
Context
2007-03-30
14:46
Tease apart the two phases of pager commit. (CVS 3763) check-in: e5f17078 user: drh tags: trunk
14:06
Refactoring the btree and pager routines into distinct two-phase commit routines. We've always done a two-phase commit - this change is just making that more apparent in the code. (CVS 3762) check-in: 66b3ad09 user: drh tags: trunk
13:35
Make yypMinor available to the stack overflow callbacks in lemon generated parsers. This does not effect SQLite. (CVS 3761) check-in: 70c8c7e2 user: drh tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to src/btree.c.

     5      5   ** a legal notice, here is a blessing:
     6      6   **
     7      7   **    May you do good and not evil.
     8      8   **    May you find forgiveness for yourself and forgive others.
     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12         -** $Id: btree.c,v 1.346 2007/03/30 11:12:08 drh Exp $
           12  +** $Id: btree.c,v 1.347 2007/03/30 14:06:34 drh Exp $
    13     13   **
    14     14   ** This file implements a external (disk-based) database using BTrees.
    15     15   ** For a detailed discussion of BTrees, refer to
    16     16   **
    17     17   **     Donald E. Knuth, THE ART OF COMPUTER PROGRAMMING, Volume 3:
    18     18   **     "Sorting And Searching", pages 473-480. Addison-Wesley
    19     19   **     Publishing Company, Reading, Massachusetts.
................................................................................
  2402   2402     assert( nRef==sqlite3PagerRefcount(pPager) );
  2403   2403     if( rc!=SQLITE_OK ){
  2404   2404       sqlite3PagerRollback(pPager);
  2405   2405     }
  2406   2406     return rc;
  2407   2407   }
  2408   2408   #endif
         2409  +
         2410  +/*
         2411  +** This routine does the first phase of a two-phase commit.  This routine
         2412  +** causes a rollback journal to be created (if it does not already exist)
         2413  +** and populated with enough information so that if a power loss occurs
         2414  +** the database can be restored to its original state by playing back
         2415  +** the journal.  Then the contents of the journal are flushed out to
         2416  +** the disk.  After the journal is safely on oxide, the changes to the
         2417  +** database are written into the database file and flushed to oxide.
         2418  +** At the end of this call, the rollback journal still exists on the
         2419  +** disk and we are still holding all locks, so the transaction has not
         2420  +** committed.  See sqlite3BtreeCommit() for the second phase of the
         2421  +** commit process.
         2422  +**
         2423  +** This call is a no-op if no write-transaction is currently active on pBt.
         2424  +**
         2425  +** Otherwise, sync the database file for the btree pBt. zMaster points to
         2426  +** the name of a master journal file that should be written into the
         2427  +** individual journal file, or is NULL, indicating no master journal file 
         2428  +** (single database transaction).
         2429  +**
         2430  +** When this is called, the master journal should already have been
         2431  +** created, populated with this journal pointer and synced to disk.
         2432  +**
         2433  +** Once this is routine has returned, the only thing required to commit
         2434  +** the write-transaction for this database file is to delete the journal.
         2435  +*/
         2436  +int sqlite3BtreeCommitPhaseOne(Btree *p, const char *zMaster){
         2437  +  int rc = SQLITE_OK;
         2438  +  if( p->inTrans==TRANS_WRITE ){
         2439  +    BtShared *pBt = p->pBt;
         2440  +    Pgno nTrunc = 0;
         2441  +#ifndef SQLITE_OMIT_AUTOVACUUM
         2442  +    if( pBt->autoVacuum ){
         2443  +      rc = autoVacuumCommit(pBt, &nTrunc); 
         2444  +      if( rc!=SQLITE_OK ){
         2445  +        return rc;
         2446  +      }
         2447  +    }
         2448  +#endif
         2449  +    rc = sqlite3PagerCommitPhaseOne(pBt->pPager, zMaster, nTrunc);
         2450  +  }
         2451  +  return rc;
         2452  +}
  2409   2453   
  2410   2454   /*
  2411   2455   ** Commit the transaction currently in progress.
  2412   2456   **
  2413   2457   ** This routine implements the second phase of a 2-phase commit.  The
  2414   2458   ** sqlite3BtreeSync() routine does the first phase and should be invoked
  2415   2459   ** prior to calling this routine.  The sqlite3BtreeSync() routine did
................................................................................
  2417   2461   ** contents so that they are written onto the disk platter.  All this
  2418   2462   ** routine has to do is delete or truncate the rollback journal
  2419   2463   ** (which causes the transaction to commit) and drop locks.
  2420   2464   **
  2421   2465   ** This will release the write lock on the database file.  If there
  2422   2466   ** are no active cursors, it also releases the read lock.
  2423   2467   */
  2424         -int sqlite3BtreeCommit(Btree *p){
         2468  +int sqlite3BtreeCommitPhaseTwo(Btree *p){
  2425   2469     BtShared *pBt = p->pBt;
  2426   2470   
  2427   2471     btreeIntegrity(p);
  2428   2472   
  2429   2473     /* If the handle has a write-transaction open, commit the shared-btrees 
  2430   2474     ** transaction and set the shared state to TRANS_READ.
  2431   2475     */
  2432   2476     if( p->inTrans==TRANS_WRITE ){
  2433   2477       int rc;
  2434   2478       assert( pBt->inTransaction==TRANS_WRITE );
  2435   2479       assert( pBt->nTransaction>0 );
  2436         -    rc = sqlite3PagerCommit(pBt->pPager);
         2480  +    rc = sqlite3PagerCommitPhaseTwo(pBt->pPager);
  2437   2481       if( rc!=SQLITE_OK ){
  2438   2482         return rc;
  2439   2483       }
  2440   2484       pBt->inTransaction = TRANS_READ;
  2441   2485       pBt->inStmt = 0;
  2442   2486     }
  2443   2487     unlockAllTables(p);
................................................................................
  2459   2503     */
  2460   2504     p->inTrans = TRANS_NONE;
  2461   2505     unlockBtreeIfUnused(pBt);
  2462   2506   
  2463   2507     btreeIntegrity(p);
  2464   2508     return SQLITE_OK;
  2465   2509   }
         2510  +
         2511  +/*
         2512  +** Do both phases of a commit.
         2513  +*/
         2514  +int sqlite3BtreeCommit(Btree *p){
         2515  +  int rc;
         2516  +  rc = sqlite3BtreeCommitPhaseOne(p, 0);
         2517  +  if( rc==SQLITE_OK ){
         2518  +    rc = sqlite3BtreeCommitPhaseTwo(p);
         2519  +  }
         2520  +  return rc;
         2521  +}
  2466   2522   
  2467   2523   #ifndef NDEBUG
  2468   2524   /*
  2469   2525   ** Return the number of write-cursors open on this handle. This is for use
  2470   2526   ** in assert() expressions, so it is only compiled if NDEBUG is not
  2471   2527   ** defined.
  2472   2528   */
................................................................................
  6518   6574   /*
  6519   6575   ** Return non-zero if a read (or write) transaction is active.
  6520   6576   */
  6521   6577   int sqlite3BtreeIsInReadTrans(Btree *p){
  6522   6578     return (p && (p->inTrans!=TRANS_NONE));
  6523   6579   }
  6524   6580   
  6525         -/*
  6526         -** This routine does the first phase of a 2-phase commit.  This routine
  6527         -** causes a rollback journal to be created (if it does not already exist)
  6528         -** and populated with enough information so that if a power loss occurs
  6529         -** the database can be restored to its original state by playing back
  6530         -** the journal.  Then the contents of the journal are flushed out to
  6531         -** the disk.  After the journal is safely on oxide, the changes to the
  6532         -** database are written into the database file and flushed to oxide.
  6533         -** At the end of this call, the rollback journal still exists on the
  6534         -** disk and we are still holding all locks, so the transaction has not
  6535         -** committed.  See sqlite3BtreeCommit() for the second phase of the
  6536         -** commit process.
  6537         -**
  6538         -** This call is a no-op if no write-transaction is currently active on pBt.
  6539         -**
  6540         -** Otherwise, sync the database file for the btree pBt. zMaster points to
  6541         -** the name of a master journal file that should be written into the
  6542         -** individual journal file, or is NULL, indicating no master journal file 
  6543         -** (single database transaction).
  6544         -**
  6545         -** When this is called, the master journal should already have been
  6546         -** created, populated with this journal pointer and synced to disk.
  6547         -**
  6548         -** Once this is routine has returned, the only thing required to commit
  6549         -** the write-transaction for this database file is to delete the journal.
  6550         -*/
  6551         -int sqlite3BtreeSync(Btree *p, const char *zMaster){
  6552         -  int rc = SQLITE_OK;
  6553         -  if( p->inTrans==TRANS_WRITE ){
  6554         -    BtShared *pBt = p->pBt;
  6555         -    Pgno nTrunc = 0;
  6556         -#ifndef SQLITE_OMIT_AUTOVACUUM
  6557         -    if( pBt->autoVacuum ){
  6558         -      rc = autoVacuumCommit(pBt, &nTrunc); 
  6559         -      if( rc!=SQLITE_OK ){
  6560         -        return rc;
  6561         -      }
  6562         -    }
  6563         -#endif
  6564         -    rc = sqlite3PagerSync(pBt->pPager, zMaster, nTrunc);
  6565         -  }
  6566         -  return rc;
  6567         -}
  6568         -
  6569   6581   /*
  6570   6582   ** This function returns a pointer to a blob of memory associated with
  6571   6583   ** a single shared-btree. The memory is used by client code for it's own
  6572   6584   ** purposes (for example, to store a high-level schema associated with 
  6573   6585   ** the shared-btree). The btree layer manages reference counting issues.
  6574   6586   **
  6575   6587   ** The first time this is called on a shared-btree, nBytes bytes of memory

Changes to src/btree.h.

     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12     12   ** This header file defines the interface that the sqlite B-Tree file
    13     13   ** subsystem.  See comments in the source code for a detailed description
    14     14   ** of what each interface routine does.
    15     15   **
    16         -** @(#) $Id: btree.h,v 1.73 2007/03/29 05:51:49 drh Exp $
           16  +** @(#) $Id: btree.h,v 1.74 2007/03/30 14:06:34 drh Exp $
    17     17   */
    18     18   #ifndef _BTREE_H_
    19     19   #define _BTREE_H_
    20     20   
    21     21   /* TODO: This definition is just included so other modules compile. It
    22     22   ** needs to be revisited.
    23     23   */
................................................................................
    63     63   int sqlite3BtreeSyncDisabled(Btree*);
    64     64   int sqlite3BtreeSetPageSize(Btree*,int,int);
    65     65   int sqlite3BtreeGetPageSize(Btree*);
    66     66   int sqlite3BtreeGetReserve(Btree*);
    67     67   int sqlite3BtreeSetAutoVacuum(Btree *, int);
    68     68   int sqlite3BtreeGetAutoVacuum(Btree *);
    69     69   int sqlite3BtreeBeginTrans(Btree*,int);
           70  +int sqlite3BtreeCommitPhaseOne(Btree*, const char *zMaster);
           71  +int sqlite3BtreeCommitPhaseTwo(Btree*);
    70     72   int sqlite3BtreeCommit(Btree*);
    71     73   int sqlite3BtreeRollback(Btree*);
    72     74   int sqlite3BtreeBeginStmt(Btree*);
    73     75   int sqlite3BtreeCommitStmt(Btree*);
    74     76   int sqlite3BtreeRollbackStmt(Btree*);
    75     77   int sqlite3BtreeCreateTable(Btree*, int*, int flags);
    76     78   int sqlite3BtreeIsInTrans(Btree*);
    77     79   int sqlite3BtreeIsInStmt(Btree*);
    78     80   int sqlite3BtreeIsInReadTrans(Btree*);
    79         -int sqlite3BtreeSync(Btree*, const char *zMaster);
    80     81   void *sqlite3BtreeSchema(Btree *, int, void(*)(void *));
    81     82   int sqlite3BtreeSchemaLocked(Btree *);
    82     83   int sqlite3BtreeLockTable(Btree *, int, u8);
    83     84   
    84     85   const char *sqlite3BtreeGetFilename(Btree *);
    85     86   const char *sqlite3BtreeGetDirname(Btree *);
    86     87   const char *sqlite3BtreeGetJournalname(Btree *);

Changes to src/pager.c.

    14     14   ** The pager is used to access a database disk file.  It implements
    15     15   ** atomic commit and rollback through the use of a journal file that
    16     16   ** is separate from the database file.  The pager also implements file
    17     17   ** locking to prevent two processes from writing the same database
    18     18   ** file simultaneously, or one process from reading the database while
    19     19   ** another is writing.
    20     20   **
    21         -** @(#) $Id: pager.c,v 1.306 2007/03/29 18:19:52 drh Exp $
           21  +** @(#) $Id: pager.c,v 1.307 2007/03/30 14:06:34 drh Exp $
    22     22   */
    23     23   #ifndef SQLITE_OMIT_DISKIO
    24     24   #include "sqliteInt.h"
    25     25   #include "os.h"
    26     26   #include "pager.h"
    27     27   #include <assert.h>
    28     28   #include <string.h>
................................................................................
   156    156     PgHdr *pNextFree, *pPrevFree;  /* Freelist of pages where nRef==0 */
   157    157     PgHdr *pNextAll;               /* A list of all pages */
   158    158     PgHdr *pNextStmt, *pPrevStmt;  /* List of pages in the statement journal */
   159    159     u8 inJournal;                  /* TRUE if has been written to journal */
   160    160     u8 inStmt;                     /* TRUE if in the statement subjournal */
   161    161     u8 dirty;                      /* TRUE if we need to write back changes */
   162    162     u8 needSync;                   /* Sync journal before writing this page */
   163         -  u8 alwaysRollback;             /* Disable dont_rollback() for this page */
          163  +  u8 alwaysRollback;             /* Disable DontRollback() for this page */
   164    164     short int nRef;                /* Number of users of this page */
   165    165     PgHdr *pDirty, *pPrevDirty;    /* Dirty pages */
   166    166     u32 notUsed;                   /* Buffer space */
   167    167   #ifdef SQLITE_CHECK_PAGES
   168    168     u32 pageHash;
   169    169   #endif
   170    170     /* pPager->pageSize bytes of page data follow this header */
................................................................................
   232    232     u8 fullSync;                /* Do extra syncs of the journal for robustness */
   233    233     u8 full_fsync;              /* Use F_FULLFSYNC when available */
   234    234     u8 state;                   /* PAGER_UNLOCK, _SHARED, _RESERVED, etc. */
   235    235     u8 tempFile;                /* zFilename is a temporary file */
   236    236     u8 readOnly;                /* True for a read-only database */
   237    237     u8 needSync;                /* True if an fsync() is needed on the journal */
   238    238     u8 dirtyCache;              /* True if cached pages have changed */
   239         -  u8 alwaysRollback;          /* Disable dont_rollback() for all pages */
          239  +  u8 alwaysRollback;          /* Disable DontRollback() for all pages */
   240    240     u8 memDb;                   /* True to inhibit all file I/O */
   241    241     u8 setMaster;               /* True if a m-j name has been written to jrnl */
          242  +  u8 doNotSync;               /* Boolean. While true, do not spill the cache */
          243  +  u8 exclusiveMode;           /* Boolean. True if locking_mode==EXCLUSIVE */
          244  +  u8 changeCountDone;         /* Set after incrementing the change-counter */
   242    245     int errCode;                /* One of several kinds of errors */
   243    246     int dbSize;                 /* Number of pages in the file */
   244    247     int origDbSize;             /* dbSize before the current change */
   245    248     int stmtSize;               /* Size of database (in pages) at stmt_begin() */
   246    249     int nRec;                   /* Number of pages written to the journal */
   247    250     u32 cksumInit;              /* Quasi-random value added to every checksum */
   248    251     int stmtNRec;               /* Number of records in stmt subjournal */
................................................................................
   282    285     int nHash;                  /* Size of the pager hash table */
   283    286     PgHdr **aHash;              /* Hash table to map page number to PgHdr */
   284    287   #ifdef SQLITE_ENABLE_MEMORY_MANAGEMENT
   285    288     Pager *pNext;               /* Linked list of pagers in this thread */
   286    289   #endif
   287    290     char *pTmpSpace;            /* Pager.pageSize bytes of space for tmp use */
   288    291     u32 iChangeCount;           /* Db change-counter for which cache is valid */
   289         -  u8 doNotSync;               /* Boolean. While true, do not spill the cache */
   290         -  u8 exclusiveMode;           /* Boolean. True if locking_mode==EXCLUSIVE */
   291         -  u8 changeCountDone;         /* Set after incrementing the change-counter */
   292    292   };
   293    293   
   294    294   /*
   295    295   ** If SQLITE_TEST is defined then increment the variable given in
   296    296   ** the argument
   297    297   */
   298    298   #ifdef SQLITE_TEST
................................................................................
   906    906     sqliteFree(pPager->aHash);
   907    907     pPager->nPage = 0;
   908    908     pPager->aHash = 0;
   909    909     pPager->nRef = 0;
   910    910   }
   911    911   
   912    912   /*
          913  +** This routine ends a transaction.  A transaction is ended by either
          914  +** a COMMIT or a ROLLBACK.
          915  +**
   913    916   ** When this routine is called, the pager has the journal file open and
   914         -** a RESERVED or EXCLUSIVE lock on the database.  This routine releases
   915         -** the database lock and acquires a SHARED lock in its place.  The journal
   916         -** file is deleted and closed.
          917  +** a RESERVED or EXCLUSIVE lock on the database.  This routine will release
          918  +** the database lock and acquires a SHARED lock in its place if that is
          919  +** the appropriate thing to do.  Release locks usually is appropriate,
          920  +** unless we are in exclusive access mode or unless this is a 
          921  +** COMMIT AND BEGIN or ROLLBACK AND BEGIN operation.
          922  +**
          923  +** The journal file is either deleted or truncated.
   917    924   **
   918    925   ** TODO: Consider keeping the journal file open for temporary databases.
   919    926   ** This might give a performance improvement on windows where opening
   920    927   ** a file is an expensive operation.
   921    928   */
   922         -static int pager_unwritelock(Pager *pPager){
          929  +static int pager_end_transaction(Pager *pPager){
   923    930     PgHdr *pPg;
   924    931     int rc = SQLITE_OK;
   925    932     int rc2 = SQLITE_OK;
   926    933     assert( !MEMDB );
   927    934     if( pPager->state<PAGER_RESERVED ){
   928    935       return SQLITE_OK;
   929    936     }
................................................................................
  1419   1426       }
  1420   1427     }
  1421   1428     /*NOTREACHED*/
  1422   1429     assert( 0 );
  1423   1430   
  1424   1431   end_playback:
  1425   1432     if( rc==SQLITE_OK ){
  1426         -    rc = pager_unwritelock(pPager);
         1433  +    rc = pager_end_transaction(pPager);
  1427   1434     }
  1428   1435     if( zMaster ){
  1429   1436       /* If there was a master journal and this routine will return success,
  1430   1437       ** see if it is possible to delete the master journal.
  1431   1438       */
  1432   1439       if( rc==SQLITE_OK ){
  1433   1440         rc = pager_delmaster(zMaster);
................................................................................
  2559   2566         return rc;
  2560   2567       }
  2561   2568     }
  2562   2569     assert( pPg->dirty==0 );
  2563   2570   
  2564   2571     /* If the page we are recycling is marked as alwaysRollback, then
  2565   2572     ** set the global alwaysRollback flag, thus disabling the
  2566         -  ** sqlite_dont_rollback() optimization for the rest of this transaction.
         2573  +  ** sqlite3PagerDontRollback() optimization for the rest of this transaction.
  2567   2574     ** It is necessary to do this because the page marked alwaysRollback
  2568   2575     ** might be reloaded at a later time but at that point we won't remember
  2569   2576     ** that is was marked alwaysRollback.  This means that all pages must
  2570   2577     ** be marked as alwaysRollback from here on out.
  2571   2578     */
  2572   2579     if( pPg->alwaysRollback ){
  2573   2580       IOTRACE(("ALWAYS_ROLLBACK %p\n", pPager))
................................................................................
  3092   3099   
  3093   3100     rc = writeJournalHdr(pPager);
  3094   3101   
  3095   3102     if( pPager->stmtAutoopen && rc==SQLITE_OK ){
  3096   3103       rc = sqlite3PagerStmtBegin(pPager);
  3097   3104     }
  3098   3105     if( rc!=SQLITE_OK && rc!=SQLITE_NOMEM ){
  3099         -    rc = pager_unwritelock(pPager);
         3106  +    rc = pager_end_transaction(pPager);
  3100   3107       if( rc==SQLITE_OK ){
  3101   3108         rc = SQLITE_FULL;
  3102   3109       }
  3103   3110     }
  3104   3111     return rc;
  3105   3112   
  3106   3113   failed_to_open_journal:
................................................................................
  3123   3130     return rc;
  3124   3131   }
  3125   3132   
  3126   3133   /*
  3127   3134   ** Acquire a write-lock on the database.  The lock is removed when
  3128   3135   ** the any of the following happen:
  3129   3136   **
  3130         -**   *  sqlite3PagerCommit() is called.
         3137  +**   *  sqlite3PagerCommitPhaseTwo() is called.
  3131   3138   **   *  sqlite3PagerRollback() is called.
  3132   3139   **   *  sqlite3PagerClose() is called.
  3133   3140   **   *  sqlite3PagerUnref() is called to on every outstanding page.
  3134   3141   **
  3135   3142   ** The first parameter to this routine is a pointer to any open page of the
  3136   3143   ** database file.  Nothing changes about the page - it is used merely to
  3137   3144   ** acquire a pointer to the Pager structure and as proof that there is
................................................................................
  3524   3531   ** When this routine is called, set the alwaysRollback flag to true.
  3525   3532   ** Subsequent calls to sqlite3PagerDontRollback() for the same page
  3526   3533   ** will thereafter be ignored.  This is necessary to avoid a problem
  3527   3534   ** where a page with data is added to the freelist during one part of
  3528   3535   ** a transaction then removed from the freelist during a later part
  3529   3536   ** of the same transaction and reused for some other purpose.  When it
  3530   3537   ** is first added to the freelist, this routine is called.  When reused,
  3531         -** the dont_rollback() routine is called.  But because the page contains
  3532         -** critical data, we still need to be sure it gets rolled back in spite
  3533         -** of the dont_rollback() call.
         3538  +** the sqlite3PagerDontRollback() routine is called.  But because the
         3539  +** page contains critical data, we still need to be sure it gets
         3540  +** rolled back in spite of the sqlite3PagerDontRollback() call.
  3534   3541   */
  3535   3542   void sqlite3PagerDontWrite(Pager *pPager, Pgno pgno){
  3536   3543     PgHdr *pPg;
  3537   3544   
  3538   3545     if( MEMDB ) return;
  3539   3546   
  3540   3547     pPg = pager_lookup(pPager, pgno);
................................................................................
  3589   3596       assert( pPg->inJournal || (int)pPg->pgno>pPager->origDbSize );
  3590   3597       assert( pPager->aInStmt!=0 );
  3591   3598       pPager->aInStmt[pPg->pgno/8] |= 1<<(pPg->pgno&7);
  3592   3599       page_add_to_stmt_list(pPg);
  3593   3600     }
  3594   3601   }
  3595   3602   
         3603  +
         3604  +/*
         3605  +** This routine is called to increment the database file change-counter,
         3606  +** stored at byte 24 of the pager file.
         3607  +*/
         3608  +static int pager_incr_changecounter(Pager *pPager){
         3609  +  PgHdr *pPgHdr;
         3610  +  u32 change_counter;
         3611  +  int rc;
         3612  +
         3613  +  if( !pPager->changeCountDone ){
         3614  +    /* Open page 1 of the file for writing. */
         3615  +    rc = sqlite3PagerGet(pPager, 1, &pPgHdr);
         3616  +    if( rc!=SQLITE_OK ) return rc;
         3617  +    rc = sqlite3PagerWrite(pPgHdr);
         3618  +    if( rc!=SQLITE_OK ) return rc;
         3619  +  
         3620  +    /* Read the current value at byte 24. */
         3621  +    change_counter = retrieve32bits(pPgHdr, 24);
         3622  +  
         3623  +    /* Increment the value just read and write it back to byte 24. */
         3624  +    change_counter++;
         3625  +    put32bits(((char*)PGHDR_TO_DATA(pPgHdr))+24, change_counter);
         3626  +    pPager->iChangeCount = change_counter;
         3627  +  
         3628  +    /* Release the page reference. */
         3629  +    sqlite3PagerUnref(pPgHdr);
         3630  +    pPager->changeCountDone = 1;
         3631  +  }
         3632  +  return SQLITE_OK;
         3633  +}
         3634  +
         3635  +/*
         3636  +** Sync the database file for the pager pPager. zMaster points to the name
         3637  +** of a master journal file that should be written into the individual
         3638  +** journal file. zMaster may be NULL, which is interpreted as no master
         3639  +** journal (a single database transaction).
         3640  +**
         3641  +** This routine ensures that the journal is synced, all dirty pages written
         3642  +** to the database file and the database file synced. The only thing that
         3643  +** remains to commit the transaction is to delete the journal file (or
         3644  +** master journal file if specified).
         3645  +**
         3646  +** Note that if zMaster==NULL, this does not overwrite a previous value
         3647  +** passed to an sqlite3PagerCommitPhaseOne() call.
         3648  +**
         3649  +** If parameter nTrunc is non-zero, then the pager file is truncated to
         3650  +** nTrunc pages (this is used by auto-vacuum databases).
         3651  +*/
         3652  +int sqlite3PagerCommitPhaseOne(Pager *pPager, const char *zMaster, Pgno nTrunc){
         3653  +  int rc = SQLITE_OK;
         3654  +
         3655  +  PAGERTRACE4("DATABASE SYNC: File=%s zMaster=%s nTrunc=%d\n", 
         3656  +      pPager->zFilename, zMaster, nTrunc);
         3657  +
         3658  +  /* If this is an in-memory db, or no pages have been written to, or this
         3659  +  ** function has already been called, it is a no-op.
         3660  +  */
         3661  +  if( pPager->state!=PAGER_SYNCED && !MEMDB && pPager->dirtyCache ){
         3662  +    PgHdr *pPg;
         3663  +    assert( pPager->journalOpen );
         3664  +
         3665  +    /* If a master journal file name has already been written to the
         3666  +    ** journal file, then no sync is required. This happens when it is
         3667  +    ** written, then the process fails to upgrade from a RESERVED to an
         3668  +    ** EXCLUSIVE lock. The next time the process tries to commit the
         3669  +    ** transaction the m-j name will have already been written.
         3670  +    */
         3671  +    if( !pPager->setMaster ){
         3672  +      rc = pager_incr_changecounter(pPager);
         3673  +      if( rc!=SQLITE_OK ) goto sync_exit;
         3674  +#ifndef SQLITE_OMIT_AUTOVACUUM
         3675  +      if( nTrunc!=0 ){
         3676  +        /* If this transaction has made the database smaller, then all pages
         3677  +        ** being discarded by the truncation must be written to the journal
         3678  +        ** file.
         3679  +        */
         3680  +        Pgno i;
         3681  +        int iSkip = PAGER_MJ_PGNO(pPager);
         3682  +        for( i=nTrunc+1; i<=pPager->origDbSize; i++ ){
         3683  +          if( !(pPager->aInJournal[i/8] & (1<<(i&7))) && i!=iSkip ){
         3684  +            rc = sqlite3PagerGet(pPager, i, &pPg);
         3685  +            if( rc!=SQLITE_OK ) goto sync_exit;
         3686  +            rc = sqlite3PagerWrite(pPg);
         3687  +            sqlite3PagerUnref(pPg);
         3688  +            if( rc!=SQLITE_OK ) goto sync_exit;
         3689  +          }
         3690  +        } 
         3691  +      }
         3692  +#endif
         3693  +      rc = writeMasterJournal(pPager, zMaster);
         3694  +      if( rc!=SQLITE_OK ) goto sync_exit;
         3695  +      rc = syncJournal(pPager);
         3696  +      if( rc!=SQLITE_OK ) goto sync_exit;
         3697  +    }
         3698  +
         3699  +#ifndef SQLITE_OMIT_AUTOVACUUM
         3700  +    if( nTrunc!=0 ){
         3701  +      rc = sqlite3PagerTruncate(pPager, nTrunc);
         3702  +      if( rc!=SQLITE_OK ) goto sync_exit;
         3703  +    }
         3704  +#endif
         3705  +
         3706  +    /* Write all dirty pages to the database file */
         3707  +    pPg = pager_get_all_dirty_pages(pPager);
         3708  +    rc = pager_write_pagelist(pPg);
         3709  +    if( rc!=SQLITE_OK ) goto sync_exit;
         3710  +
         3711  +    /* Sync the database file. */
         3712  +    if( !pPager->noSync ){
         3713  +      rc = sqlite3OsSync(pPager->fd, 0);
         3714  +    }
         3715  +    IOTRACE(("DBSYNC %p\n", pPager))
         3716  +
         3717  +    pPager->state = PAGER_SYNCED;
         3718  +  }else if( MEMDB && nTrunc!=0 ){
         3719  +    rc = sqlite3PagerTruncate(pPager, nTrunc);
         3720  +  }
         3721  +
         3722  +sync_exit:
         3723  +  return rc;
         3724  +}
         3725  +
  3596   3726   
  3597   3727   /*
  3598   3728   ** Commit all changes to the database and release the write lock.
  3599   3729   **
  3600   3730   ** If the commit fails for any reason, a rollback attempt is made
  3601   3731   ** and an error code is returned.  If the commit worked, SQLITE_OK
  3602   3732   ** is returned.
  3603   3733   */
  3604         -int sqlite3PagerCommit(Pager *pPager){
         3734  +int sqlite3PagerCommitPhaseTwo(Pager *pPager){
  3605   3735     int rc;
  3606   3736     PgHdr *pPg;
  3607   3737   
  3608   3738     if( pPager->errCode ){
  3609   3739       return pPager->errCode;
  3610   3740     }
  3611   3741     if( pPager->state<PAGER_RESERVED ){
................................................................................
  3636   3766       pPager->state = PAGER_SHARED;
  3637   3767       return SQLITE_OK;
  3638   3768     }
  3639   3769     if( pPager->dirtyCache==0 ){
  3640   3770       /* Exit early (without doing the time-consuming sqlite3OsSync() calls)
  3641   3771       ** if there have been no changes to the database file. */
  3642   3772       assert( pPager->needSync==0 );
  3643         -    rc = pager_unwritelock(pPager);
         3773  +    rc = pager_end_transaction(pPager);
  3644   3774     }else{
  3645   3775       assert( pPager->journalOpen );
  3646         -    rc = sqlite3PagerSync(pPager, 0, 0);
         3776  +    rc = sqlite3PagerCommitPhaseOne(pPager, 0, 0);
  3647   3777       if( rc==SQLITE_OK ){
  3648         -      rc = pager_unwritelock(pPager);
         3778  +      rc = pager_end_transaction(pPager);
  3649   3779       }
  3650   3780     }
  3651   3781     return pager_error(pPager, rc);
  3652   3782   }
  3653   3783   
  3654   3784   /*
  3655   3785   ** Rollback all changes.  The database falls back to PAGER_SHARED mode.
................................................................................
  3699   3829       memoryTruncate(pPager);
  3700   3830       pPager->stmtInUse = 0;
  3701   3831       pPager->state = PAGER_SHARED;
  3702   3832       return SQLITE_OK;
  3703   3833     }
  3704   3834   
  3705   3835     if( !pPager->dirtyCache || !pPager->journalOpen ){
  3706         -    rc = pager_unwritelock(pPager);
         3836  +    rc = pager_end_transaction(pPager);
  3707   3837       return rc;
  3708   3838     }
  3709   3839   
  3710   3840     if( pPager->errCode && pPager->errCode!=SQLITE_FULL ){
  3711   3841       if( pPager->state>=PAGER_EXCLUSIVE ){
  3712   3842         pager_playback(pPager, 0);
  3713   3843       }
  3714   3844       return pPager->errCode;
  3715   3845     }
  3716   3846     if( pPager->state==PAGER_RESERVED ){
  3717   3847       int rc2;
  3718   3848       rc = pager_playback(pPager, 0);
  3719         -    rc2 = pager_unwritelock(pPager);
         3849  +    rc2 = pager_end_transaction(pPager);
  3720   3850       if( rc==SQLITE_OK ){
  3721   3851         rc = rc2;
  3722   3852       }
  3723   3853     }else{
  3724   3854       rc = pager_playback(pPager, 0);
  3725   3855     }
  3726   3856     pPager->dbSize = -1;
................................................................................
  3922   4052     void *(*xCodec)(void*,void*,Pgno,int),
  3923   4053     void *pCodecArg
  3924   4054   ){
  3925   4055     pPager->xCodec = xCodec;
  3926   4056     pPager->pCodecArg = pCodecArg;
  3927   4057   }
  3928   4058   
  3929         -/*
  3930         -** This routine is called to increment the database file change-counter,
  3931         -** stored at byte 24 of the pager file.
  3932         -*/
  3933         -static int pager_incr_changecounter(Pager *pPager){
  3934         -  PgHdr *pPgHdr;
  3935         -  u32 change_counter;
  3936         -  int rc;
  3937         -
  3938         -  if( !pPager->changeCountDone ){
  3939         -    /* Open page 1 of the file for writing. */
  3940         -    rc = sqlite3PagerGet(pPager, 1, &pPgHdr);
  3941         -    if( rc!=SQLITE_OK ) return rc;
  3942         -    rc = sqlite3PagerWrite(pPgHdr);
  3943         -    if( rc!=SQLITE_OK ) return rc;
  3944         -  
  3945         -    /* Read the current value at byte 24. */
  3946         -    change_counter = retrieve32bits(pPgHdr, 24);
  3947         -  
  3948         -    /* Increment the value just read and write it back to byte 24. */
  3949         -    change_counter++;
  3950         -    put32bits(((char*)PGHDR_TO_DATA(pPgHdr))+24, change_counter);
  3951         -    pPager->iChangeCount = change_counter;
  3952         -  
  3953         -    /* Release the page reference. */
  3954         -    sqlite3PagerUnref(pPgHdr);
  3955         -    pPager->changeCountDone = 1;
  3956         -  }
  3957         -  return SQLITE_OK;
  3958         -}
  3959         -
  3960         -/*
  3961         -** Sync the database file for the pager pPager. zMaster points to the name
  3962         -** of a master journal file that should be written into the individual
  3963         -** journal file. zMaster may be NULL, which is interpreted as no master
  3964         -** journal (a single database transaction).
  3965         -**
  3966         -** This routine ensures that the journal is synced, all dirty pages written
  3967         -** to the database file and the database file synced. The only thing that
  3968         -** remains to commit the transaction is to delete the journal file (or
  3969         -** master journal file if specified).
  3970         -**
  3971         -** Note that if zMaster==NULL, this does not overwrite a previous value
  3972         -** passed to an sqlite3PagerSync() call.
  3973         -**
  3974         -** If parameter nTrunc is non-zero, then the pager file is truncated to
  3975         -** nTrunc pages (this is used by auto-vacuum databases).
  3976         -*/
  3977         -int sqlite3PagerSync(Pager *pPager, const char *zMaster, Pgno nTrunc){
  3978         -  int rc = SQLITE_OK;
  3979         -
  3980         -  PAGERTRACE4("DATABASE SYNC: File=%s zMaster=%s nTrunc=%d\n", 
  3981         -      pPager->zFilename, zMaster, nTrunc);
  3982         -
  3983         -  /* If this is an in-memory db, or no pages have been written to, or this
  3984         -  ** function has already been called, it is a no-op.
  3985         -  */
  3986         -  if( pPager->state!=PAGER_SYNCED && !MEMDB && pPager->dirtyCache ){
  3987         -    PgHdr *pPg;
  3988         -    assert( pPager->journalOpen );
  3989         -
  3990         -    /* If a master journal file name has already been written to the
  3991         -    ** journal file, then no sync is required. This happens when it is
  3992         -    ** written, then the process fails to upgrade from a RESERVED to an
  3993         -    ** EXCLUSIVE lock. The next time the process tries to commit the
  3994         -    ** transaction the m-j name will have already been written.
  3995         -    */
  3996         -    if( !pPager->setMaster ){
  3997         -      rc = pager_incr_changecounter(pPager);
  3998         -      if( rc!=SQLITE_OK ) goto sync_exit;
  3999         -#ifndef SQLITE_OMIT_AUTOVACUUM
  4000         -      if( nTrunc!=0 ){
  4001         -        /* If this transaction has made the database smaller, then all pages
  4002         -        ** being discarded by the truncation must be written to the journal
  4003         -        ** file.
  4004         -        */
  4005         -        Pgno i;
  4006         -        int iSkip = PAGER_MJ_PGNO(pPager);
  4007         -        for( i=nTrunc+1; i<=pPager->origDbSize; i++ ){
  4008         -          if( !(pPager->aInJournal[i/8] & (1<<(i&7))) && i!=iSkip ){
  4009         -            rc = sqlite3PagerGet(pPager, i, &pPg);
  4010         -            if( rc!=SQLITE_OK ) goto sync_exit;
  4011         -            rc = sqlite3PagerWrite(pPg);
  4012         -            sqlite3PagerUnref(pPg);
  4013         -            if( rc!=SQLITE_OK ) goto sync_exit;
  4014         -          }
  4015         -        } 
  4016         -      }
  4017         -#endif
  4018         -      rc = writeMasterJournal(pPager, zMaster);
  4019         -      if( rc!=SQLITE_OK ) goto sync_exit;
  4020         -      rc = syncJournal(pPager);
  4021         -      if( rc!=SQLITE_OK ) goto sync_exit;
  4022         -    }
  4023         -
  4024         -#ifndef SQLITE_OMIT_AUTOVACUUM
  4025         -    if( nTrunc!=0 ){
  4026         -      rc = sqlite3PagerTruncate(pPager, nTrunc);
  4027         -      if( rc!=SQLITE_OK ) goto sync_exit;
  4028         -    }
  4029         -#endif
  4030         -
  4031         -    /* Write all dirty pages to the database file */
  4032         -    pPg = pager_get_all_dirty_pages(pPager);
  4033         -    rc = pager_write_pagelist(pPg);
  4034         -    if( rc!=SQLITE_OK ) goto sync_exit;
  4035         -
  4036         -    /* Sync the database file. */
  4037         -    if( !pPager->noSync ){
  4038         -      rc = sqlite3OsSync(pPager->fd, 0);
  4039         -    }
  4040         -    IOTRACE(("DBSYNC %p\n", pPager))
  4041         -
  4042         -    pPager->state = PAGER_SYNCED;
  4043         -  }else if( MEMDB && nTrunc!=0 ){
  4044         -    rc = sqlite3PagerTruncate(pPager, nTrunc);
  4045         -  }
  4046         -
  4047         -sync_exit:
  4048         -  return rc;
  4049         -}
  4050         -
  4051   4059   #ifndef SQLITE_OMIT_AUTOVACUUM
  4052   4060   /*
  4053   4061   ** Move the page identified by pData to location pgno in the file. 
  4054   4062   **
  4055   4063   ** There must be no references to the current page pgno. If current page
  4056   4064   ** pgno is not already in the rollback journal, it is not written there by
  4057   4065   ** by this routine. The same applies to the page pData refers to on entry to

Changes to src/pager.h.

     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12     12   ** This header file defines the interface that the sqlite page cache
    13     13   ** subsystem.  The page cache subsystem reads and writes a file a page
    14     14   ** at a time and provides a journal for rollback.
    15     15   **
    16         -** @(#) $Id: pager.h,v 1.56 2007/03/27 16:19:52 danielk1977 Exp $
           16  +** @(#) $Id: pager.h,v 1.57 2007/03/30 14:06:34 drh Exp $
    17     17   */
    18     18   
    19     19   #ifndef _PAGER_H_
    20     20   #define _PAGER_H_
    21     21   
    22     22   /*
    23     23   ** The default size of a database page.
................................................................................
    97     97   Pgno sqlite3PagerPagenumber(DbPage*);
    98     98   int sqlite3PagerWrite(DbPage*);
    99     99   int sqlite3PagerIswriteable(DbPage*);
   100    100   int sqlite3PagerOverwrite(Pager *pPager, Pgno pgno, void*);
   101    101   int sqlite3PagerPagecount(Pager*);
   102    102   int sqlite3PagerTruncate(Pager*,Pgno);
   103    103   int sqlite3PagerBegin(DbPage*, int exFlag);
   104         -int sqlite3PagerCommit(Pager*);
   105         -int sqlite3PagerSync(Pager*,const char *zMaster, Pgno);
          104  +int sqlite3PagerCommitPhaseOne(Pager*,const char *zMaster, Pgno);
          105  +int sqlite3PagerCommitPhaseTwo(Pager*);
   106    106   int sqlite3PagerRollback(Pager*);
   107    107   int sqlite3PagerIsreadonly(Pager*);
   108    108   int sqlite3PagerStmtBegin(Pager*);
   109    109   int sqlite3PagerStmtCommit(Pager*);
   110    110   int sqlite3PagerStmtRollback(Pager*);
   111    111   void sqlite3PagerDontRollback(DbPage*);
   112    112   void sqlite3PagerDontWrite(Pager*, Pgno);

Changes to src/test2.c.

     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12     12   ** Code for testing the pager.c module in SQLite.  This code
    13     13   ** is not included in the SQLite library.  It is used for automated
    14     14   ** testing of the SQLite library.
    15     15   **
    16         -** $Id: test2.c,v 1.41 2007/03/19 17:44:28 danielk1977 Exp $
           16  +** $Id: test2.c,v 1.42 2007/03/30 14:06:34 drh Exp $
    17     17   */
    18     18   #include "sqliteInt.h"
    19     19   #include "os.h"
    20     20   #include "pager.h"
    21     21   #include "tcl.h"
    22     22   #include <stdlib.h>
    23     23   #include <string.h>
................................................................................
   159    159     int rc;
   160    160     if( argc!=2 ){
   161    161       Tcl_AppendResult(interp, "wrong # args: should be \"", argv[0],
   162    162          " ID\"", 0);
   163    163       return TCL_ERROR;
   164    164     }
   165    165     pPager = sqlite3TextToPtr(argv[1]);
   166         -  rc = sqlite3PagerCommit(pPager);
          166  +  rc = sqlite3PagerCommitPhaseOne(pPager, 0, 0);
          167  +  if( rc!=SQLITE_OK ){
          168  +    Tcl_AppendResult(interp, errorName(rc), 0);
          169  +    return TCL_ERROR;
          170  +  }
          171  +  rc = sqlite3PagerCommitPhaseTwo(pPager);
   167    172     if( rc!=SQLITE_OK ){
   168    173       Tcl_AppendResult(interp, errorName(rc), 0);
   169    174       return TCL_ERROR;
   170    175     }
   171    176     return TCL_OK;
   172    177   }
   173    178   

Changes to src/vdbeaux.c.

  1086   1086     ** not support atomic multi-file commits, so use the simple case then
  1087   1087     ** too.
  1088   1088     */
  1089   1089     if( 0==strlen(sqlite3BtreeGetFilename(db->aDb[0].pBt)) || nTrans<=1 ){
  1090   1090       for(i=0; rc==SQLITE_OK && i<db->nDb; i++){ 
  1091   1091         Btree *pBt = db->aDb[i].pBt;
  1092   1092         if( pBt ){
  1093         -        rc = sqlite3BtreeSync(pBt, 0);
         1093  +        rc = sqlite3BtreeCommitPhaseOne(pBt, 0);
  1094   1094         }
  1095   1095       }
  1096   1096   
  1097         -    /* Do the commit only if all databases successfully synced.
  1098         -    ** If one of the BtreeCommit() calls fails, this indicates an IO error
  1099         -    ** while deleting or truncating a journal file. It is unlikely, but
  1100         -    ** could happen. In this case abandon processing and return the error.
         1097  +    /* Do the commit only if all databases successfully complete phase 1. 
         1098  +    ** If one of the BtreeCommitPhaseOne() calls fails, this indicates an
         1099  +    ** IO error while deleting or truncating a journal file. It is unlikely,
         1100  +    ** but could happen. In this case abandon processing and return the error.
  1101   1101       */
  1102   1102       for(i=0; rc==SQLITE_OK && i<db->nDb; i++){
  1103   1103         Btree *pBt = db->aDb[i].pBt;
  1104   1104         if( pBt ){
  1105         -        rc = sqlite3BtreeCommit(pBt);
         1105  +        rc = sqlite3BtreeCommitPhaseTwo(pBt);
  1106   1106         }
  1107   1107       }
  1108   1108       if( rc==SQLITE_OK ){
  1109   1109         sqlite3VtabCommit(db);
  1110   1110       }
  1111   1111     }
  1112   1112   
................................................................................
  1178   1178         return rc;
  1179   1179       }
  1180   1180   
  1181   1181       /* Sync all the db files involved in the transaction. The same call
  1182   1182       ** sets the master journal pointer in each individual journal. If
  1183   1183       ** an error occurs here, do not delete the master journal file.
  1184   1184       **
  1185         -    ** If the error occurs during the first call to sqlite3BtreeSync(),
  1186         -    ** then there is a chance that the master journal file will be
  1187         -    ** orphaned. But we cannot delete it, in case the master journal
  1188         -    ** file name was written into the journal file before the failure
  1189         -    ** occured.
         1185  +    ** If the error occurs during the first call to
         1186  +    ** sqlite3BtreeCommitPhaseOne(), then there is a chance that the
         1187  +    ** master journal file will be orphaned. But we cannot delete it,
         1188  +    ** in case the master journal file name was written into the journal
         1189  +    ** file before the failure occured.
  1190   1190       */
  1191   1191       for(i=0; rc==SQLITE_OK && i<db->nDb; i++){ 
  1192   1192         Btree *pBt = db->aDb[i].pBt;
  1193   1193         if( pBt && sqlite3BtreeIsInTrans(pBt) ){
  1194         -        rc = sqlite3BtreeSync(pBt, zMaster);
         1194  +        rc = sqlite3BtreeCommitPhaseOne(pBt, zMaster);
  1195   1195         }
  1196   1196       }
  1197   1197       sqlite3OsClose(&master);
  1198   1198       if( rc!=SQLITE_OK ){
  1199   1199         sqliteFree(zMaster);
  1200   1200         return rc;
  1201   1201       }
................................................................................
  1219   1219         ** master journal exists now or if it will exist after the operating
  1220   1220         ** system crash that may follow the fsync() failure.
  1221   1221         */
  1222   1222         return rc;
  1223   1223       }
  1224   1224   
  1225   1225       /* All files and directories have already been synced, so the following
  1226         -    ** calls to sqlite3BtreeCommit() are only closing files and deleting
  1227         -    ** journals. If something goes wrong while this is happening we don't
  1228         -    ** really care. The integrity of the transaction is already guaranteed,
  1229         -    ** but some stray 'cold' journals may be lying around. Returning an
  1230         -    ** error code won't help matters.
         1226  +    ** calls to sqlite3BtreeCommitPhaseTwo() are only closing files and
         1227  +    ** deleting or truncating journals. If something goes wrong while
         1228  +    ** this is happening we don't really care. The integrity of the
         1229  +    ** transaction is already guaranteed, but some stray 'cold' journals
         1230  +    ** may be lying around. Returning an error code won't help matters.
  1231   1231       */
  1232   1232       disable_simulated_io_errors();
  1233   1233       for(i=0; i<db->nDb; i++){ 
  1234   1234         Btree *pBt = db->aDb[i].pBt;
  1235   1235         if( pBt ){
  1236         -        sqlite3BtreeCommit(pBt);
         1236  +        sqlite3BtreeCommitPhaseTwo(pBt);
  1237   1237         }
  1238   1238       }
  1239   1239       enable_simulated_io_errors();
  1240   1240   
  1241   1241       sqlite3VtabCommit(db);
  1242   1242     }
  1243   1243   #endif