/ Check-in [a353c1ab]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Revert (6187). (CVS 6188)
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: a353c1ab376b159c4d12532412365318cdbdcc60
User & Date: danielk1977 2009-01-16 16:23:38
Context
2009-01-16
16:40
Fix a change-counter bug similar to #3584. This one is much more obscure though, requiring a transient IO or malloc error to occur while running in exclusive mode. (CVS 6189) check-in: 9f07d2d9 user: danielk1977 tags: trunk
16:23
Revert (6187). (CVS 6188) check-in: a353c1ab user: danielk1977 tags: trunk
15:21
This commit is an error. Reverted by (6188). (CVS 6187) check-in: aa67fd0c user: danielk1977 tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to src/bitvec.c.

    30     30   ** Clear operations are exceedingly rare.  There are usually between
    31     31   ** 5 and 500 set operations per Bitvec object, though the number of sets can
    32     32   ** sometimes grow into tens of thousands or larger.  The size of the
    33     33   ** Bitvec object is the number of pages in the database file at the
    34     34   ** start of a transaction, and is thus usually less than a few thousand,
    35     35   ** but can be as large as 2 billion for a really big database.
    36     36   **
    37         -** @(#) $Id: bitvec.c,v 1.11 2009/01/16 15:21:05 danielk1977 Exp $
           37  +** @(#) $Id: bitvec.c,v 1.12 2009/01/16 16:23:38 danielk1977 Exp $
    38     38   */
    39     39   #include "sqliteInt.h"
    40     40   
    41     41   /* Size of the Bitvec structure in bytes. */
    42     42   #define BITVEC_SZ        512
    43     43   
    44     44   /* Round the union size down to the nearest pointer boundary, since that's how 
................................................................................
   271    271       for(i=0; i<BITVEC_NPTR; i++){
   272    272         sqlite3BitvecDestroy(p->u.apSub[i]);
   273    273       }
   274    274     }
   275    275     sqlite3_free(p);
   276    276   }
   277    277   
   278         -/*
   279         -** Return the value of the iSize parameter specified when Bitvec *p
   280         -** was created.
   281         -*/
   282         -u32 sqlite3BitvecSize(Bitvec *p){
   283         -  return p->iSize;
   284         -}
   285         -
   286    278   #ifndef SQLITE_OMIT_BUILTIN_TEST
   287    279   /*
   288    280   ** Let V[] be an array of unsigned characters sufficient to hold
   289    281   ** up to N bits.  Let I be an integer between 0 and N.  0<=I<N.
   290    282   ** Then the following macros can be used to set, clear, or test
   291    283   ** individual bits within V.
   292    284   */

Changes to src/btree.c.

     5      5   ** a legal notice, here is a blessing:
     6      6   **
     7      7   **    May you do good and not evil.
     8      8   **    May you find forgiveness for yourself and forgive others.
     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12         -** $Id: btree.c,v 1.559 2009/01/16 15:21:06 danielk1977 Exp $
           12  +** $Id: btree.c,v 1.560 2009/01/16 16:23:38 danielk1977 Exp $
    13     13   **
    14     14   ** This file implements a external (disk-based) database using BTrees.
    15     15   ** See the header comment on "btreeInt.h" for additional information.
    16     16   ** Including a description of file format and an overview of operation.
    17     17   */
    18     18   #include "btreeInt.h"
    19     19   
................................................................................
   278    278     }
   279    279   }
   280    280   #else
   281    281     #define invalidateOverflowCache(x)
   282    282     #define invalidateAllOverflowCache(x)
   283    283   #endif
   284    284   
   285         -/*
   286         -** Set bit pgno of the BtShared.pHasContent bitvec. This is called 
   287         -** when a page that previously contained data becomes a free-list leaf 
   288         -** page.
   289         -**
   290         -** The BtShared.pHasContent bitvec exists to work around an obscure
   291         -** bug caused by the interaction of two useful IO optimizations surrounding
   292         -** free-list leaf pages:
   293         -**
   294         -**   1) When all data is deleted from a page and the page becomes
   295         -**      a free-list leaf page, the page is not written to the database
   296         -**      (as free-list leaf pages contain no meaningful data). Sometimes
   297         -**      such a page is not even journalled (as it will not be modified,
   298         -**      why bother journalling it?).
   299         -**
   300         -**   2) When a free-list leaf page is reused, its content is not read
   301         -**      from the database or written to the journal file (why should it
   302         -**      be, if it is not at all meaningful?).
   303         -**
   304         -** By themselves, these optimizations work fine and provide a handy
   305         -** performance boost to bulk delete or insert operations. However, if
   306         -** a page is moved to the free-list and then reused within the same
   307         -** transaction, a problem comes up. If the page is not journalled when
   308         -** it is moved to the free-list and it is also not journalled when it
   309         -** is extracted from the free-list and reused, then the original data
   310         -** may be lost. In the event of a rollback, it may not be possible
   311         -** to restore the database to its original configuration.
   312         -**
   313         -** The solution is the BtShared.pHasContent bitvec. Whenever a page is 
   314         -** moved to become a free-list leaf page, the corresponding bit is
   315         -** set in the bitvec. Whenever a leaf page is extracted from the free-list,
   316         -** optimization 2 above is ommitted if the corresponding bit is already
   317         -** set in BtShared.pHasContent. The contents of the bitvec are cleared
   318         -** at the end of every transaction.
   319         -*/
   320         -static int btreeSetHasContent(BtShared *pBt, Pgno pgno){
   321         -  int rc = SQLITE_OK;
   322         -  if( !pBt->pHasContent ){
   323         -    int nPage;
   324         -    rc = sqlite3PagerPagecount(pBt->pPager, &nPage);
   325         -    if( rc==SQLITE_OK ){
   326         -      pBt->pHasContent = sqlite3BitvecCreate((u32)nPage);
   327         -      if( !pBt->pHasContent ){
   328         -        rc = SQLITE_NOMEM;
   329         -      }
   330         -    }
   331         -  }
   332         -  if( rc==SQLITE_OK && pgno<=sqlite3BitvecSize(pBt->pHasContent) ){
   333         -    rc = sqlite3BitvecSet(pBt->pHasContent, pgno);
   334         -  }
   335         -  return rc;
   336         -}
   337         -
   338         -/*
   339         -** Query the BtShared.pHasContent vector.
   340         -**
   341         -** This function is called when a free-list leaf page is removed from the
   342         -** free-list for reuse. It returns false if it is safe to retrieve the
   343         -** page from the pager layer with the 'no-content' flag set. True otherwise.
   344         -*/
   345         -static int btreeGetHasContent(BtShared *pBt, Pgno pgno){
   346         -  Bitvec *p = pBt->pHasContent;
   347         -  return (p && (pgno>sqlite3BitvecSize(p) || sqlite3BitvecTest(p, pgno)));
   348         -}
   349         -
   350         -/*
   351         -** Clear (destroy) the BtShared.pHasContent bitvec. This should be
   352         -** invoked at the conclusion of each write-transaction.
   353         -*/
   354         -static void btreeClearHasContent(BtShared *pBt){
   355         -  sqlite3BitvecDestroy(pBt->pHasContent);
   356         -  pBt->pHasContent = 0;
   357         -}
   358         -
   359    285   /*
   360    286   ** Save the current cursor position in the variables BtCursor.nKey 
   361    287   ** and BtCursor.pKey. The cursor's state is set to CURSOR_REQUIRESEEK.
   362    288   */
   363    289   static int saveCursorPosition(BtCursor *pCur){
   364    290     int rc;
   365    291   
................................................................................
  1170   1096     assert( sqlite3_mutex_held(pBt->mutex) );
  1171   1097     rc = sqlite3PagerAcquire(pBt->pPager, pgno, (DbPage**)&pDbPage, noContent);
  1172   1098     if( rc ) return rc;
  1173   1099     *ppPage = btreePageFromDbPage(pDbPage, pgno, pBt);
  1174   1100     return SQLITE_OK;
  1175   1101   }
  1176   1102   
  1177         -/*
  1178         -** Retrieve a page from the pager cache. If the requested page is not
  1179         -** already in the pager cache return NULL. Initialize the MemPage.pBt and
  1180         -** MemPage.aData elements if needed.
  1181         -*/
  1182         -static MemPage *btreePageLookup(BtShared *pBt, Pgno pgno){
  1183         -  DbPage *pDbPage;
  1184         -  assert( sqlite3_mutex_held(pBt->mutex) );
  1185         -  pDbPage = sqlite3PagerLookup(pBt->pPager, pgno);
  1186         -  if( pDbPage ){
  1187         -    return btreePageFromDbPage(pDbPage, pgno, pBt);
  1188         -  }
  1189         -  return 0;
  1190         -}
  1191         -
  1192   1103   /*
  1193   1104   ** Return the size of the database file in pages. If there is any kind of
  1194   1105   ** error, return ((unsigned int)-1).
  1195   1106   */
  1196   1107   static Pgno pagerPagecount(BtShared *pBt){
  1197   1108     int nPage = -1;
  1198   1109     int rc;
................................................................................
  1209   1120   */
  1210   1121   static int getAndInitPage(
  1211   1122     BtShared *pBt,          /* The database file */
  1212   1123     Pgno pgno,           /* Number of the page to get */
  1213   1124     MemPage **ppPage     /* Write the page pointer here */
  1214   1125   ){
  1215   1126     int rc;
         1127  +  DbPage *pDbPage;
  1216   1128     MemPage *pPage;
  1217   1129   
  1218   1130     assert( sqlite3_mutex_held(pBt->mutex) );
  1219   1131     if( pgno==0 ){
  1220   1132       return SQLITE_CORRUPT_BKPT; 
  1221   1133     }
  1222   1134   
  1223   1135     /* It is often the case that the page we want is already in cache.
  1224   1136     ** If so, get it directly.  This saves us from having to call
  1225   1137     ** pagerPagecount() to make sure pgno is within limits, which results
  1226   1138     ** in a measureable performance improvements.
  1227   1139     */
  1228         -  *ppPage = pPage = btreePageLookup(pBt, pgno);
  1229         -  if( pPage ){
         1140  +  pDbPage = sqlite3PagerLookup(pBt->pPager, pgno);
         1141  +  if( pDbPage ){
  1230   1142       /* Page is already in cache */
         1143  +    *ppPage = pPage = btreePageFromDbPage(pDbPage, pgno, pBt);
  1231   1144       rc = SQLITE_OK;
  1232   1145     }else{
  1233   1146       /* Page not in cache.  Acquire it. */
  1234   1147       if( pgno>pagerPagecount(pBt) ){
  1235   1148         return SQLITE_CORRUPT_BKPT; 
  1236   1149       }
  1237   1150       rc = sqlite3BtreeGetPage(pBt, pgno, ppPage, 0);
................................................................................
  2455   2368     sqlite3BtreeEnter(p);
  2456   2369     pBt->db = p->db;
  2457   2370     assert( pBt->inTransaction==TRANS_WRITE && p->inTrans==TRANS_WRITE );
  2458   2371     if( !pBt->autoVacuum ){
  2459   2372       rc = SQLITE_DONE;
  2460   2373     }else{
  2461   2374       invalidateAllOverflowCache(pBt);
  2462         -    rc = incrVacuumStep(pBt, 0, pagerPagecount(pBt));
         2375  +    rc = incrVacuumStep(pBt, 0, sqlite3PagerImageSize(pBt->pPager));
  2463   2376     }
  2464   2377     sqlite3BtreeLeave(p);
  2465   2378     return rc;
  2466   2379   }
  2467   2380   
  2468   2381   /*
  2469   2382   ** This routine is called prior to sqlite3PagerCommit when a transaction
................................................................................
  2623   2536         pBt->inTransaction = TRANS_NONE;
  2624   2537       }
  2625   2538     }
  2626   2539   
  2627   2540     /* Set the handles current transaction state to TRANS_NONE and unlock
  2628   2541     ** the pager if this call closed the only read or write transaction.
  2629   2542     */
  2630         -  btreeClearHasContent(pBt);
  2631   2543     p->inTrans = TRANS_NONE;
  2632   2544     unlockBtreeIfUnused(pBt);
  2633   2545   
  2634   2546     btreeIntegrity(p);
  2635   2547     sqlite3BtreeLeave(p);
  2636   2548     return SQLITE_OK;
  2637   2549   }
................................................................................
  2759   2671       assert( pBt->nTransaction>0 );
  2760   2672       pBt->nTransaction--;
  2761   2673       if( 0==pBt->nTransaction ){
  2762   2674         pBt->inTransaction = TRANS_NONE;
  2763   2675       }
  2764   2676     }
  2765   2677   
  2766         -  btreeClearHasContent(pBt);
  2767   2678     p->inTrans = TRANS_NONE;
  2768   2679     pBt->inStmt = 0;
  2769   2680     unlockBtreeIfUnused(pBt);
  2770   2681   
  2771   2682     btreeIntegrity(p);
  2772   2683     sqlite3BtreeLeave(p);
  2773   2684     return rc;
................................................................................
  3167   3078   ** Given the page number of an overflow page in the database (parameter
  3168   3079   ** ovfl), this function finds the page number of the next page in the 
  3169   3080   ** linked list of overflow pages. If possible, it uses the auto-vacuum
  3170   3081   ** pointer-map data instead of reading the content of page ovfl to do so. 
  3171   3082   **
  3172   3083   ** If an error occurs an SQLite error code is returned. Otherwise:
  3173   3084   **
  3174         -** The page number of the next overflow page in the linked list is 
  3175         -** written to *pPgnoNext. If page ovfl is the last page in its linked 
  3176         -** list, *pPgnoNext is set to zero. 
         3085  +** Unless pPgnoNext is NULL, the page number of the next overflow 
         3086  +** page in the linked list is written to *pPgnoNext. If page ovfl
         3087  +** is the last page in its linked list, *pPgnoNext is set to zero. 
  3177   3088   **
  3178         -** If ppPage is not NULL, and a reference to the MemPage object corresponding
  3179         -** to page number pOvfl was obtained, then *ppPage is set to point to that
  3180         -** reference. It is the responsibility of the caller to call releasePage()
  3181         -** on *ppPage to free the reference. In no reference was obtained (because
  3182         -** the pointer-map was used to obtain the value for *pPgnoNext), then
  3183         -** *ppPage is set to zero.
         3089  +** If ppPage is not NULL, *ppPage is set to the MemPage* handle
         3090  +** for page ovfl. The underlying pager page may have been requested
         3091  +** with the noContent flag set, so the page data accessable via
         3092  +** this handle may not be trusted.
  3184   3093   */
  3185   3094   static int getOverflowPage(
  3186   3095     BtShared *pBt, 
  3187   3096     Pgno ovfl,                   /* Overflow page */
  3188         -  MemPage **ppPage,            /* OUT: MemPage handle (may be NULL) */
         3097  +  MemPage **ppPage,            /* OUT: MemPage handle */
  3189   3098     Pgno *pPgnoNext              /* OUT: Next overflow page number */
  3190   3099   ){
  3191   3100     Pgno next = 0;
  3192         -  MemPage *pPage = 0;
  3193   3101     int rc = SQLITE_OK;
  3194   3102   
  3195   3103     assert( sqlite3_mutex_held(pBt->mutex) );
  3196         -  assert(pPgnoNext);
         3104  +  /* One of these must not be NULL. Otherwise, why call this function? */
         3105  +  assert(ppPage || pPgnoNext);
         3106  +
         3107  +  /* If pPgnoNext is NULL, then this function is being called to obtain
         3108  +  ** a MemPage* reference only. No page-data is required in this case.
         3109  +  */
         3110  +  if( !pPgnoNext ){
         3111  +    return sqlite3BtreeGetPage(pBt, ovfl, ppPage, 1);
         3112  +  }
  3197   3113   
  3198   3114   #ifndef SQLITE_OMIT_AUTOVACUUM
  3199   3115     /* Try to find the next page in the overflow list using the
  3200   3116     ** autovacuum pointer-map pages. Guess that the next page in 
  3201   3117     ** the overflow list is page number (ovfl+1). If that guess turns 
  3202   3118     ** out to be wrong, fall back to loading the data of page 
  3203   3119     ** number ovfl to determine the next page number.
................................................................................
  3209   3125   
  3210   3126       while( PTRMAP_ISPAGE(pBt, iGuess) || iGuess==PENDING_BYTE_PAGE(pBt) ){
  3211   3127         iGuess++;
  3212   3128       }
  3213   3129   
  3214   3130       if( iGuess<=pagerPagecount(pBt) ){
  3215   3131         rc = ptrmapGet(pBt, iGuess, &eType, &pgno);
  3216         -      if( rc==SQLITE_OK && eType==PTRMAP_OVERFLOW2 && pgno==ovfl ){
         3132  +      if( rc!=SQLITE_OK ){
         3133  +        return rc;
         3134  +      }
         3135  +      if( eType==PTRMAP_OVERFLOW2 && pgno==ovfl ){
  3217   3136           next = iGuess;
  3218         -        rc = SQLITE_DONE;
  3219   3137         }
  3220   3138       }
  3221   3139     }
  3222   3140   #endif
  3223   3141   
  3224         -  if( rc==SQLITE_OK ){
  3225         -    rc = sqlite3BtreeGetPage(pBt, ovfl, &pPage, 0);
         3142  +  if( next==0 || ppPage ){
         3143  +    MemPage *pPage = 0;
         3144  +
         3145  +    rc = sqlite3BtreeGetPage(pBt, ovfl, &pPage, next!=0);
  3226   3146       assert(rc==SQLITE_OK || pPage==0);
  3227   3147       if( next==0 && rc==SQLITE_OK ){
  3228   3148         next = get4byte(pPage->aData);
  3229   3149       }
         3150  +
         3151  +    if( ppPage ){
         3152  +      *ppPage = pPage;
         3153  +    }else{
         3154  +      releasePage(pPage);
         3155  +    }
  3230   3156     }
  3231         -
  3232   3157     *pPgnoNext = next;
  3233         -  if( ppPage ){
  3234         -    *ppPage = pPage;
  3235         -  }else{
  3236         -    releasePage(pPage);
  3237         -  }
  3238         -  return (rc==SQLITE_DONE ? SQLITE_OK : rc);
         3158  +
         3159  +  return rc;
  3239   3160   }
  3240   3161   
  3241   3162   /*
  3242   3163   ** Copy data from a buffer to a page, or from a page to a buffer.
  3243   3164   **
  3244   3165   ** pPayload is a pointer to data stored on database page pDbPage.
  3245   3166   ** If argument eOp is false, then nByte bytes of data are copied
................................................................................
  4340   4261             }
  4341   4262           }else{
  4342   4263             closest = 0;
  4343   4264           }
  4344   4265   
  4345   4266           iPage = get4byte(&aData[8+closest*4]);
  4346   4267           if( !searchList || iPage==nearby ){
  4347         -          int noContent;
  4348   4268             Pgno nPage;
  4349   4269             *pPgno = iPage;
  4350   4270             nPage = pagerPagecount(pBt);
  4351   4271             if( *pPgno>nPage ){
  4352   4272               /* Free page off the end of the file */
  4353   4273               rc = SQLITE_CORRUPT_BKPT;
  4354   4274               goto end_allocate_page;
................................................................................
  4357   4277                    ": %d more free pages\n",
  4358   4278                    *pPgno, closest+1, k, pTrunk->pgno, n-1));
  4359   4279             if( closest<k-1 ){
  4360   4280               memcpy(&aData[8+closest*4], &aData[4+k*4], 4);
  4361   4281             }
  4362   4282             put4byte(&aData[4], k-1);
  4363   4283             assert( sqlite3PagerIswriteable(pTrunk->pDbPage) );
  4364         -          noContent = !btreeGetHasContent(pBt, *pPgno);
  4365         -          rc = sqlite3BtreeGetPage(pBt, *pPgno, ppPage, noContent);
         4284  +          rc = sqlite3BtreeGetPage(pBt, *pPgno, ppPage, 1);
  4366   4285             if( rc==SQLITE_OK ){
         4286  +            sqlite3PagerDontRollback((*ppPage)->pDbPage);
  4367   4287               rc = sqlite3PagerWrite((*ppPage)->pDbPage);
  4368   4288               if( rc!=SQLITE_OK ){
  4369   4289                 releasePage(*ppPage);
  4370   4290               }
  4371   4291             }
  4372   4292             searchList = 0;
  4373   4293           }
................................................................................
  4377   4297       }while( searchList );
  4378   4298     }else{
  4379   4299       /* There are no pages on the freelist, so create a new page at the
  4380   4300       ** end of the file */
  4381   4301       int nPage = pagerPagecount(pBt);
  4382   4302       *pPgno = nPage + 1;
  4383   4303   
  4384         -    if( *pPgno==PENDING_BYTE_PAGE(pBt) ){
  4385         -      (*pPgno)++;
  4386         -    }
  4387         -
  4388   4304   #ifndef SQLITE_OMIT_AUTOVACUUM
  4389   4305       if( pBt->autoVacuum && PTRMAP_ISPAGE(pBt, *pPgno) ){
  4390   4306         /* If *pPgno refers to a pointer-map page, allocate two new pages
  4391   4307         ** at the end of the file instead of one. The first allocated page
  4392   4308         ** becomes a new pointer-map page, the second is used by the caller.
  4393   4309         */
  4394   4310         TRACE(("ALLOCATE: %d from end of file (pointer-map page)\n", *pPgno));
................................................................................
  4420   4336       }
  4421   4337       (*ppPage)->isInit = 0;
  4422   4338     }
  4423   4339     return rc;
  4424   4340   }
  4425   4341   
  4426   4342   /*
  4427         -** This function is used to add page iPage to the database file free-list. 
  4428         -** It is assumed that the page is not already a part of the free-list.
         4343  +** Add a page of the database file to the freelist.
  4429   4344   **
  4430         -** The value passed as the second argument to this function is optional.
  4431         -** If the caller happens to have a pointer to the MemPage object 
  4432         -** corresponding to page iPage handy, it may pass it as the second value. 
  4433         -** Otherwise, it may pass NULL.
  4434         -**
  4435         -** If a pointer to a MemPage object is passed as the second argument,
  4436         -** its reference count is not altered by this function.
         4345  +** sqlite3PagerUnref() is NOT called for pPage.
  4437   4346   */
  4438         -static int freePage2(BtShared *pBt, MemPage *pMemPage, Pgno iPage){
  4439         -  MemPage *pTrunk = 0;                /* Free-list trunk page */
  4440         -  Pgno iTrunk = 0;                    /* Page number of free-list trunk page */ 
  4441         -  MemPage *pPage1 = pBt->pPage1;      /* Local reference to page 1 */
  4442         -  MemPage *pPage;                     /* Page being freed. May be NULL. */
  4443         -  int rc;                             /* Return Code */
  4444         -  int nFree;                          /* Initial number of pages on free-list */
         4347  +static int freePage(MemPage *pPage){
         4348  +  BtShared *pBt = pPage->pBt;
         4349  +  MemPage *pPage1 = pBt->pPage1;
         4350  +  int rc, n, k;
  4445   4351   
  4446         -  assert( sqlite3_mutex_held(pBt->mutex) );
  4447         -  assert( iPage>1 );
  4448         -  assert( !pMemPage || pMemPage->pgno==iPage );
  4449         -
  4450         -  if( pMemPage ){
  4451         -    pPage = pMemPage;
  4452         -    sqlite3PagerRef(pPage->pDbPage);
  4453         -  }else{
  4454         -    pPage = btreePageLookup(pBt, iPage);
  4455         -  }
         4352  +  /* Prepare the page for freeing */
         4353  +  assert( sqlite3_mutex_held(pPage->pBt->mutex) );
         4354  +  assert( pPage->pgno>1 );
         4355  +  pPage->isInit = 0;
  4456   4356   
  4457   4357     /* Increment the free page count on pPage1 */
  4458   4358     rc = sqlite3PagerWrite(pPage1->pDbPage);
  4459         -  if( rc ) goto freepage_out;
  4460         -  nFree = get4byte(&pPage1->aData[36]);
  4461         -  put4byte(&pPage1->aData[36], nFree+1);
         4359  +  if( rc ) return rc;
         4360  +  n = get4byte(&pPage1->aData[36]);
         4361  +  put4byte(&pPage1->aData[36], n+1);
  4462   4362   
  4463   4363   #ifdef SQLITE_SECURE_DELETE
  4464   4364     /* If the SQLITE_SECURE_DELETE compile-time option is enabled, then
  4465   4365     ** always fully overwrite deleted information with zeros.
  4466   4366     */
  4467         -  if( (!pPage && (rc = sqlite3BtreeGetPage(pBt, iPage, &pPage, 0)))
  4468         -   ||            (rc = sqlite3PagerWrite(pPage->pDbPage))
  4469         -  ){
  4470         -    goto freepage_out;
  4471         -  }
         4367  +  rc = sqlite3PagerWrite(pPage->pDbPage);
         4368  +  if( rc ) return rc;
  4472   4369     memset(pPage->aData, 0, pPage->pBt->pageSize);
  4473   4370   #endif
  4474   4371   
  4475   4372     /* If the database supports auto-vacuum, write an entry in the pointer-map
  4476   4373     ** to indicate that the page is free.
  4477   4374     */
  4478   4375     if( ISAUTOVACUUM ){
  4479         -    rc = ptrmapPut(pBt, iPage, PTRMAP_FREEPAGE, 0);
  4480         -    if( rc ) goto freepage_out;
  4481         -  }
  4482         -
  4483         -  /* Now manipulate the actual database free-list structure. There are two
  4484         -  ** possibilities. If the free-list is currently empty, or if the first
  4485         -  ** trunk page in the free-list is full, then this page will become a
  4486         -  ** new free-list trunk page. Otherwise, it will become a leaf of the
  4487         -  ** first trunk page in the current free-list. This block tests if it
  4488         -  ** is possible to add the page as a new free-list leaf.
  4489         -  */
  4490         -  if( nFree!=0 ){
  4491         -    int nLeaf;                /* Initial number of leaf cells on trunk page */
  4492         -
  4493         -    iTrunk = get4byte(&pPage1->aData[32]);
  4494         -    rc = sqlite3BtreeGetPage(pBt, iTrunk, &pTrunk, 0);
  4495         -    if( rc!=SQLITE_OK ){
  4496         -      goto freepage_out;
  4497         -    }
  4498         -
  4499         -    nLeaf = get4byte(&pTrunk->aData[4]);
  4500         -    if( nLeaf<0 ){
  4501         -      rc = SQLITE_CORRUPT_BKPT;
  4502         -      goto freepage_out;
  4503         -    }
  4504         -    if( nLeaf<pBt->usableSize/4 - 8 ){
  4505         -      /* In this case there is room on the trunk page to insert the page
  4506         -      ** being freed as a new leaf.
         4376  +    rc = ptrmapPut(pBt, pPage->pgno, PTRMAP_FREEPAGE, 0);
         4377  +    if( rc ) return rc;
         4378  +  }
         4379  +
         4380  +  if( n==0 ){
         4381  +    /* This is the first free page */
         4382  +    rc = sqlite3PagerWrite(pPage->pDbPage);
         4383  +    if( rc ) return rc;
         4384  +    memset(pPage->aData, 0, 8);
         4385  +    put4byte(&pPage1->aData[32], pPage->pgno);
         4386  +    TRACE(("FREE-PAGE: %d first\n", pPage->pgno));
         4387  +  }else{
         4388  +    /* Other free pages already exist.  Retrive the first trunk page
         4389  +    ** of the freelist and find out how many leaves it has. */
         4390  +    MemPage *pTrunk;
         4391  +    rc = sqlite3BtreeGetPage(pBt, get4byte(&pPage1->aData[32]), &pTrunk, 0);
         4392  +    if( rc ) return rc;
         4393  +    k = get4byte(&pTrunk->aData[4]);
         4394  +    if( k>=pBt->usableSize/4 - 8 ){
         4395  +      /* The trunk is full.  Turn the page being freed into a new
         4396  +      ** trunk page with no leaves.
  4507   4397         **
  4508   4398         ** Note that the trunk page is not really full until it contains
  4509   4399         ** usableSize/4 - 2 entries, not usableSize/4 - 8 entries as we have
  4510   4400         ** coded.  But due to a coding error in versions of SQLite prior to
  4511   4401         ** 3.6.0, databases with freelist trunk pages holding more than
  4512   4402         ** usableSize/4 - 8 entries will be reported as corrupt.  In order
  4513   4403         ** to maintain backwards compatibility with older versions of SQLite,
  4514   4404         ** we will contain to restrict the number of entries to usableSize/4 - 8
  4515   4405         ** for now.  At some point in the future (once everyone has upgraded
  4516   4406         ** to 3.6.0 or later) we should consider fixing the conditional above
  4517   4407         ** to read "usableSize/4-2" instead of "usableSize/4-8".
  4518   4408         */
         4409  +      rc = sqlite3PagerWrite(pPage->pDbPage);
         4410  +      if( rc==SQLITE_OK ){
         4411  +        put4byte(pPage->aData, pTrunk->pgno);
         4412  +        put4byte(&pPage->aData[4], 0);
         4413  +        put4byte(&pPage1->aData[32], pPage->pgno);
         4414  +        TRACE(("FREE-PAGE: %d new trunk page replacing %d\n",
         4415  +                pPage->pgno, pTrunk->pgno));
         4416  +      }
         4417  +    }else if( k<0 ){
         4418  +      rc = SQLITE_CORRUPT;
         4419  +    }else{
         4420  +      /* Add the newly freed page as a leaf on the current trunk */
  4519   4421         rc = sqlite3PagerWrite(pTrunk->pDbPage);
  4520   4422         if( rc==SQLITE_OK ){
  4521         -        put4byte(&pTrunk->aData[4], nLeaf+1);
  4522         -        put4byte(&pTrunk->aData[8+nLeaf*4], iPage);
         4423  +        put4byte(&pTrunk->aData[4], k+1);
         4424  +        put4byte(&pTrunk->aData[8+k*4], pPage->pgno);
  4523   4425   #ifndef SQLITE_SECURE_DELETE
  4524         -        if( pPage ){
  4525         -          sqlite3PagerDontWrite(pPage->pDbPage);
  4526         -        }
         4426  +        rc = sqlite3PagerDontWrite(pPage->pDbPage);
  4527   4427   #endif
  4528         -        rc = btreeSetHasContent(pBt, iPage);
  4529   4428         }
  4530   4429         TRACE(("FREE-PAGE: %d leaf on trunk page %d\n",pPage->pgno,pTrunk->pgno));
  4531         -      goto freepage_out;
  4532   4430       }
         4431  +    releasePage(pTrunk);
  4533   4432     }
  4534         -
  4535         -  /* If control flows to this point, then it was not possible to add the
  4536         -  ** the page being freed as a leaf page of the first trunk in the free-list.
  4537         -  ** Possibly because the free-list is empty, or possibly because the 
  4538         -  ** first trunk in the free-list is full. Either way, the page being freed
  4539         -  ** will become the new first trunk page in the free-list.
  4540         -  */
  4541         -  if( (!pPage && (rc = sqlite3BtreeGetPage(pBt, iPage, &pPage, 0)))
  4542         -   ||            (rc = sqlite3PagerWrite(pPage->pDbPage))
  4543         -  ){
  4544         -    goto freepage_out;
  4545         -  }
  4546         -  put4byte(pPage->aData, iTrunk);
  4547         -  put4byte(&pPage->aData[4], 0);
  4548         -  put4byte(&pPage1->aData[32], iPage);
  4549         -  TRACE(("FREE-PAGE: %d new trunk page replacing %d\n", pPage->pgno, iTrunk));
  4550         -
  4551         -freepage_out:
  4552         -  if( pPage ){
  4553         -    pPage->isInit = 0;
  4554         -  }
  4555         -  releasePage(pPage);
  4556         -  releasePage(pTrunk);
  4557   4433     return rc;
  4558   4434   }
  4559         -static int freePage(MemPage *pPage){
  4560         -  return freePage2(pPage->pBt, pPage, pPage->pgno);
  4561         -}
  4562   4435   
  4563   4436   /*
  4564   4437   ** Free any overflow pages associated with the given Cell.
  4565   4438   */
  4566   4439   static int clearCell(MemPage *pPage, unsigned char *pCell){
  4567   4440     BtShared *pBt = pPage->pBt;
  4568   4441     CellInfo info;
................................................................................
  4577   4450       return SQLITE_OK;  /* No overflow pages. Return without doing anything */
  4578   4451     }
  4579   4452     ovflPgno = get4byte(&pCell[info.iOverflow]);
  4580   4453     ovflPageSize = pBt->usableSize - 4;
  4581   4454     nOvfl = (info.nPayload - info.nLocal + ovflPageSize - 1)/ovflPageSize;
  4582   4455     assert( ovflPgno==0 || nOvfl>0 );
  4583   4456     while( nOvfl-- ){
  4584         -    Pgno iNext;
  4585         -    MemPage *pOvfl = 0;
         4457  +    MemPage *pOvfl;
  4586   4458       if( ovflPgno==0 || ovflPgno>pagerPagecount(pBt) ){
  4587   4459         return SQLITE_CORRUPT_BKPT;
  4588   4460       }
  4589         -    if( nOvfl ){
  4590         -      rc = getOverflowPage(pBt, ovflPgno, &pOvfl, &iNext);
  4591         -      if( rc ) return rc;
  4592         -    }
  4593         -    rc = freePage2(pBt, pOvfl, ovflPgno);
  4594         -    if( pOvfl ){
  4595         -      sqlite3PagerUnref(pOvfl->pDbPage);
  4596         -    }
         4461  +
         4462  +    rc = getOverflowPage(pBt, ovflPgno, &pOvfl, (nOvfl==0)?0:&ovflPgno);
  4597   4463       if( rc ) return rc;
  4598         -    ovflPgno = iNext;
         4464  +    rc = freePage(pOvfl);
         4465  +    sqlite3PagerUnref(pOvfl->pDbPage);
         4466  +    if( rc ) return rc;
  4599   4467     }
  4600   4468     return SQLITE_OK;
  4601   4469   }
  4602   4470   
  4603   4471   /*
  4604   4472   ** Create the byte sequence used to represent a cell on page pPage
  4605   4473   ** and write that byte sequence into pCell[].  Overflow pages are
................................................................................
  6357   6225         rc = ptrmapGet(pBt, pgnoRoot, &eType, &iPtrPage);
  6358   6226         if( rc!=SQLITE_OK || eType==PTRMAP_ROOTPAGE || eType==PTRMAP_FREEPAGE ){
  6359   6227           releasePage(pRoot);
  6360   6228           return rc;
  6361   6229         }
  6362   6230         assert( eType!=PTRMAP_ROOTPAGE );
  6363   6231         assert( eType!=PTRMAP_FREEPAGE );
         6232  +      rc = sqlite3PagerWrite(pRoot->pDbPage);
         6233  +      if( rc!=SQLITE_OK ){
         6234  +        releasePage(pRoot);
         6235  +        return rc;
         6236  +      }
  6364   6237         rc = relocatePage(pBt, pRoot, eType, iPtrPage, pgnoMove, 0);
  6365   6238         releasePage(pRoot);
  6366   6239   
  6367   6240         /* Obtain the page at pgnoRoot */
  6368   6241         if( rc!=SQLITE_OK ){
  6369   6242           return rc;
  6370   6243         }
................................................................................
  7216   7089   ** The pager filename is invariant as long as the pager is
  7217   7090   ** open so it is safe to access without the BtShared mutex.
  7218   7091   */
  7219   7092   const char *sqlite3BtreeGetFilename(Btree *p){
  7220   7093     assert( p->pBt->pPager!=0 );
  7221   7094     return sqlite3PagerFilename(p->pBt->pPager);
  7222   7095   }
         7096  +
         7097  +/*
         7098  +** Return the pathname of the directory that contains the database file.
         7099  +**
         7100  +** The pager directory name is invariant as long as the pager is
         7101  +** open so it is safe to access without the BtShared mutex.
         7102  +*/
         7103  +const char *sqlite3BtreeGetDirname(Btree *p){
         7104  +  assert( p->pBt->pPager!=0 );
         7105  +  return sqlite3PagerDirname(p->pBt->pPager);
         7106  +}
  7223   7107   
  7224   7108   /*
  7225   7109   ** Return the pathname of the journal file for this database. The return
  7226   7110   ** value of this routine is the same regardless of whether the journal file
  7227   7111   ** has been created or not.
  7228   7112   **
  7229   7113   ** The pager journal filename is invariant as long as the pager is
................................................................................
  7302   7186             ** represent what they do.  Write() really means "put this page in the
  7303   7187             ** rollback journal and mark it as dirty so that it will be written
  7304   7188             ** to the database file later."  DontWrite() undoes the second part of
  7305   7189             ** that and prevents the page from being written to the database. The
  7306   7190             ** page is still on the rollback journal, though.  And that is the 
  7307   7191             ** whole point of this block: to put pages on the rollback journal. 
  7308   7192             */
  7309         -          sqlite3PagerDontWrite(pDbPage);
         7193  +          rc = sqlite3PagerDontWrite(pDbPage);
  7310   7194           }
  7311   7195           sqlite3PagerUnref(pDbPage);
  7312   7196         }
  7313   7197       }
  7314   7198   
  7315   7199       /* Overwrite the data in page i of the target database */
  7316   7200       if( rc==SQLITE_OK && i!=iSkip && i<=nNewPage ){

Changes to src/btreeInt.h.

     5      5   ** a legal notice, here is a blessing:
     6      6   **
     7      7   **    May you do good and not evil.
     8      8   **    May you find forgiveness for yourself and forgive others.
     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12         -** $Id: btreeInt.h,v 1.39 2009/01/16 15:21:06 danielk1977 Exp $
           12  +** $Id: btreeInt.h,v 1.40 2009/01/16 16:23:38 danielk1977 Exp $
    13     13   **
    14     14   ** This file implements a external (disk-based) database using BTrees.
    15     15   ** For a detailed discussion of BTrees, refer to
    16     16   **
    17     17   **     Donald E. Knuth, THE ART OF COMPUTER PROGRAMMING, Volume 3:
    18     18   **     "Sorting And Searching", pages 473-480. Addison-Wesley
    19     19   **     Publishing Company, Reading, Massachusetts.
................................................................................
   200    200   **
   201    201   **    SIZE    DESCRIPTION
   202    202   **      4     Page number of next trunk page
   203    203   **      4     Number of leaf pointers on this page
   204    204   **      *     zero or more pages numbers of leaves
   205    205   */
   206    206   #include "sqliteInt.h"
          207  +#include "pager.h"
          208  +#include "btree.h"
          209  +#include "os.h"
          210  +#include <assert.h>
   207    211   
   208    212   /* Round up a number to the next larger multiple of 8.  This is used
   209    213   ** to force 8-byte alignment on 64-bit architectures.
   210    214   */
   211    215   #define ROUND8(x)   ((x+7)&~7)
   212    216   
   213    217   
................................................................................
   375    379     u16 maxLeaf;          /* Maximum local payload in a LEAFDATA table */
   376    380     u16 minLeaf;          /* Minimum local payload in a LEAFDATA table */
   377    381     u8 inTransaction;     /* Transaction state */
   378    382     int nTransaction;     /* Number of open transactions (read + write) */
   379    383     void *pSchema;        /* Pointer to space allocated by sqlite3BtreeSchema() */
   380    384     void (*xFreeSchema)(void*);  /* Destructor for BtShared.pSchema */
   381    385     sqlite3_mutex *mutex; /* Non-recursive mutex required to access this struct */
   382         -  Bitvec *pHasContent;  /* Set of pages moved to free-list this transaction */
   383    386   #ifndef SQLITE_OMIT_SHARED_CACHE
   384    387     int nRef;             /* Number of references to this structure */
   385    388     BtShared *pNext;      /* Next on a list of sharable BtShared structs */
   386    389     BtLock *pLock;        /* List of locks held on this shared-btree struct */
   387    390     Btree *pExclusive;    /* Btree with an EXCLUSIVE lock on the whole db */
   388    391   #endif
   389    392     u8 *pTmpSpace;        /* BtShared.pageSize bytes of space for tmp use */
................................................................................
   483    486   **   should return the error code stored in BtCursor.skip
   484    487   */
   485    488   #define CURSOR_INVALID           0
   486    489   #define CURSOR_VALID             1
   487    490   #define CURSOR_REQUIRESEEK       2
   488    491   #define CURSOR_FAULT             3
   489    492   
   490         -/* 
   491         -** The database page the PENDING_BYTE occupies. This page is never used.
          493  +/* The database page the PENDING_BYTE occupies. This page is never used.
          494  +** TODO: This macro is very similary to PAGER_MJ_PGNO() in pager.c. They
          495  +** should possibly be consolidated (presumably in pager.h).
          496  +**
          497  +** If disk I/O is omitted (meaning that the database is stored purely
          498  +** in memory) then there is no pending byte.
   492    499   */
   493         -# define PENDING_BYTE_PAGE(pBt) PAGER_MJ_PGNO(pBt)
          500  +#ifdef SQLITE_OMIT_DISKIO
          501  +# define PENDING_BYTE_PAGE(pBt)  0x7fffffff
          502  +#else
          503  +# define PENDING_BYTE_PAGE(pBt) ((Pgno)((PENDING_BYTE/(pBt)->pageSize)+1))
          504  +#endif
   494    505   
   495    506   /*
   496    507   ** A linked list of the following structures is stored at BtShared.pLock.
   497    508   ** Locks are added (or upgraded from READ_LOCK to WRITE_LOCK) when a cursor 
   498    509   ** is opened on the table with root page BtShared.iTable. Locks are removed
   499    510   ** from this list when a transaction is committed or rolled back, or when
   500    511   ** a btree handle is closed.

Changes to src/pager.c.

    14     14   ** The pager is used to access a database disk file.  It implements
    15     15   ** atomic commit and rollback through the use of a journal file that
    16     16   ** is separate from the database file.  The pager also implements file
    17     17   ** locking to prevent two processes from writing the same database
    18     18   ** file simultaneously, or one process from reading the database while
    19     19   ** another is writing.
    20     20   **
    21         -** @(#) $Id: pager.c,v 1.552 2009/01/16 15:21:06 danielk1977 Exp $
           21  +** @(#) $Id: pager.c,v 1.553 2009/01/16 16:23:38 danielk1977 Exp $
    22     22   */
    23     23   #ifndef SQLITE_OMIT_DISKIO
    24     24   #include "sqliteInt.h"
    25     25   
    26     26   /*
    27     27   ** Macros for troubleshooting.  Normally turned off
    28     28   */
................................................................................
   145    145     Pgno nOrig;                  /* Original number of pages in file */
   146    146     Pgno iSubRec;                /* Index of first record in sub-journal */
   147    147   };
   148    148   
   149    149   /*
   150    150   ** A open page cache is an instance of the following structure.
   151    151   **
   152         -** errCode
   153         -**
   154         -**   Pager.errCode may be set to SQLITE_IOERR, SQLITE_CORRUPT, or
   155         -**   or SQLITE_FULL. Once one of the first three errors occurs, it persists
   156         -**   and is returned as the result of every major pager API call.  The
   157         -**   SQLITE_FULL return code is slightly different. It persists only until the
   158         -**   next successful rollback is performed on the pager cache. Also,
   159         -**   SQLITE_FULL does not affect the sqlite3PagerGet() and sqlite3PagerLookup()
   160         -**   APIs, they may still be used successfully.
   161         -**
   162         -** dbSizeValid, dbSize, dbOrigSize, dbFileSize
   163         -**
   164         -**   Managing the size of the database file in pages is a little complicated.
   165         -**   The variable Pager.dbSize contains the number of pages that the database
   166         -**   image currently contains. As the database image grows or shrinks this
   167         -**   variable is updated. The variable Pager.dbFileSize contains the number
   168         -**   of pages in the database file. This may be different from Pager.dbSize
   169         -**   if some pages have been appended to the database image but not yet written
   170         -**   out from the cache to the actual file on disk. Or if the image has been
   171         -**   truncated by an incremental-vacuum operation. The Pager.dbOrigSize variable
   172         -**   contains the number of pages in the database image when the current
   173         -**   transaction was opened. The contents of all three of these variables is
   174         -**   only guaranteed to be correct if the boolean Pager.dbSizeValid is true.
   175         -**
   176         -**   TODO: Under what conditions is dbSizeValid set? Cleared?
   177         -**
   178         -** changeCountDone
   179         -**
   180         -**   This boolean variable is used to make sure that the change-counter 
   181         -**   (the 4-byte header field at byte offset 24 of the database file) is 
   182         -**   not updated more often than necessary. 
   183         -**
   184         -**   It is set to true when the change-counter field is updated, which 
   185         -**   can only happen if an exclusive lock is held on the database file.
   186         -**   It is cleared (set to false) whenever an exclusive lock is 
   187         -**   relinquished on the database file. Each time a transaction is committed,
   188         -**   The changeCountDone flag is inspected. If it is true, the work of
   189         -**   updating the change-counter is omitted for the current transaction.
   190         -**
   191         -**   This mechanism means that when running in exclusive mode, a connection 
   192         -**   need only update the change-counter once, for the first transaction
   193         -**   committed.
   194         -**
   195         -** dbModified
   196         -**
   197         -**   The dbModified flag is set whenever a database page is dirtied.
   198         -**   It is cleared at the end of each transaction.
   199         -**
   200         -**   It is used when committing or otherwise ending a transaction. If
   201         -**   the dbModified flag is clear then less work has to be done.
   202         -**
   203         -**   TODO: Check some of the logic surrounding this optimization.
   204         -**
   205         -** journalStarted
   206         -**
   207         -**   This flag is set whenever the the main journal is synced. 
   208         -**
   209         -**   The point of this flag is that it must be set after the 
   210         -**   first journal header in a journal file has been synced to disk.
   211         -**   After this has happened, new pages appended to the database 
   212         -**   do not need the PGHDR_NEED_SYNC flag set, as they do not need
   213         -**   to wait for a journal sync before they can be written out to
   214         -**   the database file (see function pager_write()).
   215         -**   
   216         -** setMaster
   217         -**
   218         -**   This variable is used to ensure that the master journal file name
   219         -**   (if any) is only written into the journal file once.
   220         -**
   221         -**   When committing a transaction, the master journal file name (if any)
   222         -**   may be written into the journal file while the pager is still in
   223         -**   PAGER_RESERVED state (see CommitPhaseOne() for the action). It
   224         -**   then attempts to upgrade to an exclusive lock. If this attempt
   225         -**   fails, then SQLITE_BUSY may be returned to the user and the user
   226         -**   may attempt to commit the transaction again later (calling
   227         -**   CommitPhaseOne() again). This flag is used to ensure that the 
   228         -**   master journal name is only written to the journal file the first
   229         -**   time CommitPhaseOne() is called.
   230         -**
   231         -** doNotSync
   232         -**
   233         -**   This variable is set and cleared by sqlite3PagerWrite().
   234         -**
   235         -** needSync
   236         -**
   237         -**   TODO: It might be easier to set this variable in writeJournalHdr()
   238         -**   and writeMasterJournal() only. Change its meaning to "unsynced data
   239         -**   has been written to the journal".
          152  +** Pager.errCode may be set to SQLITE_IOERR, SQLITE_CORRUPT, or
          153  +** or SQLITE_FULL. Once one of the first three errors occurs, it persists
          154  +** and is returned as the result of every major pager API call.  The
          155  +** SQLITE_FULL return code is slightly different. It persists only until the
          156  +** next successful rollback is performed on the pager cache. Also,
          157  +** SQLITE_FULL does not affect the sqlite3PagerGet() and sqlite3PagerLookup()
          158  +** APIs, they may still be used successfully.
          159  +**
          160  +** Managing the size of the database file in pages is a little complicated.
          161  +** The variable Pager.dbSize contains the number of pages that the database
          162  +** image currently contains. As the database image grows or shrinks this
          163  +** variable is updated. The variable Pager.dbFileSize contains the number
          164  +** of pages in the database file. This may be different from Pager.dbSize
          165  +** if some pages have been appended to the database image but not yet written
          166  +** out from the cache to the actual file on disk. Or if the image has been
          167  +** truncated by an incremental-vacuum operation. The Pager.dbOrigSize variable
          168  +** contains the number of pages in the database image when the current
          169  +** transaction was opened. The contents of all three of these variables is
          170  +** only guaranteed to be correct if the boolean Pager.dbSizeValid is true.
   240    171   */
   241    172   struct Pager {
   242    173     sqlite3_vfs *pVfs;          /* OS functions to use for IO */
   243         -  u8 exclusiveMode;           /* Boolean. True if locking_mode==EXCLUSIVE */
   244         -  u8 journalMode;             /* On of the PAGER_JOURNALMODE_* values */
          174  +  u8 journalOpen;             /* True if journal file descriptors is valid */
          175  +  u8 journalStarted;          /* True if header of journal is synced */
   245    176     u8 useJournal;              /* Use a rollback journal on this file */
   246    177     u8 noReadlock;              /* Do not bother to obtain readlocks */
   247    178     u8 noSync;                  /* Do not sync the journal if true */
   248    179     u8 fullSync;                /* Do extra syncs of the journal for robustness */
   249    180     u8 sync_flags;              /* One of SYNC_NORMAL or SYNC_FULL */
          181  +  u8 state;                   /* PAGER_UNLOCK, _SHARED, _RESERVED, etc. */
   250    182     u8 tempFile;                /* zFilename is a temporary file */
   251    183     u8 readOnly;                /* True for a read-only database */
   252         -  u8 memDb;                   /* True to inhibit all file I/O */
   253         -
   254         -  /* The following block contains those class members that are dynamically
   255         -  ** modified during normal operations. The other variables in this structure
   256         -  ** are either constant throughout the lifetime of the pager, or else
   257         -  ** used to store configuration parameters that affect the way the pager 
   258         -  ** operates.
   259         -  **
   260         -  ** The 'state' variable is described in more detail along with the
   261         -  ** descriptions of the values it may take - PAGER_UNLOCK etc. Many of the
   262         -  ** other variables in this block are described in the comment directly 
   263         -  ** above this class definition.
   264         -  */
   265         -  u8 state;                   /* PAGER_UNLOCK, _SHARED, _RESERVED, etc. */
   266         -  u8 dbModified;              /* True if there are any changes to the Db */
   267    184     u8 needSync;                /* True if an fsync() is needed on the journal */
   268         -  u8 journalStarted;          /* True if header of journal is synced */
   269         -  u8 changeCountDone;         /* Set after incrementing the change-counter */
          185  +  u8 dirtyCache;              /* True if cached pages have changed */
          186  +  u8 memDb;                   /* True to inhibit all file I/O */
   270    187     u8 setMaster;               /* True if a m-j name has been written to jrnl */
   271    188     u8 doNotSync;               /* Boolean. While true, do not spill the cache */
          189  +  u8 exclusiveMode;           /* Boolean. True if locking_mode==EXCLUSIVE */
          190  +  u8 journalMode;             /* On of the PAGER_JOURNALMODE_* values */
          191  +  u8 dbModified;              /* True if there are any changes to the Db */
          192  +  u8 changeCountDone;         /* Set after incrementing the change-counter */
   272    193     u8 dbSizeValid;             /* Set when dbSize is correct */
   273    194     Pgno dbSize;                /* Number of pages in the database */
   274    195     Pgno dbOrigSize;            /* dbSize before the current transaction */
   275    196     Pgno dbFileSize;            /* Number of pages in the database file */
          197  +  u32 vfsFlags;               /* Flags for sqlite3_vfs.xOpen() */
   276    198     int errCode;                /* One of several kinds of errors */
   277         -  int nRec;                   /* Pages journalled since last j-header written */
          199  +  int nRec;                   /* Number of pages written to the journal */
   278    200     u32 cksumInit;              /* Quasi-random value added to every checksum */
   279         -  u32 nSubRec;                /* Number of records written to sub-journal */
          201  +  int stmtNRec;               /* Number of records in stmt subjournal */
          202  +  int nExtra;                 /* Add this many bytes to each in-memory page */
          203  +  int pageSize;               /* Number of bytes in a page */
          204  +  int nPage;                  /* Total number of in-memory pages */
          205  +  int mxPage;                 /* Maximum number of pages to hold in cache */
          206  +  Pgno mxPgno;                /* Maximum allowed size of the database */
   280    207     Bitvec *pInJournal;         /* One bit for each page in the database file */
   281         -  sqlite3_file *fd;           /* File descriptor for database */
   282         -  sqlite3_file *jfd;          /* File descriptor for main journal */
   283         -  sqlite3_file *sjfd;         /* File descriptor for sub-journal */
   284         -  i64 journalOff;             /* Current write offset in the journal file */
   285         -  i64 journalHdr;             /* Byte offset to previous journal header */
   286         -  PagerSavepoint *aSavepoint; /* Array of active savepoints */
   287         -  int nSavepoint;             /* Number of elements in aSavepoint[] */
   288         -  char dbFileVers[16];        /* Changes whenever database file changes */
   289         -  u32 sectorSize;             /* Assumed sector size during rollback */
   290         -
   291         -  int nExtra;                 /* Add this many bytes to each in-memory page */
   292         -  u32 vfsFlags;               /* Flags for sqlite3_vfs.xOpen() */
   293         -  int pageSize;               /* Number of bytes in a page */
   294         -  Pgno mxPgno;                /* Maximum allowed size of the database */
          208  +  Bitvec *pAlwaysRollback;    /* One bit for each page marked always-rollback */
   295    209     char *zFilename;            /* Name of the database file */
   296    210     char *zJournal;             /* Name of the journal file */
          211  +  char *zDirectory;           /* Directory hold database and journal files */
          212  +  sqlite3_file *fd, *jfd;     /* File descriptors for database and journal */
          213  +  sqlite3_file *sjfd;         /* File descriptor for the sub-journal*/
   297    214     int (*xBusyHandler)(void*); /* Function to call when busy */
   298    215     void *pBusyHandlerArg;      /* Context argument for xBusyHandler */
          216  +  i64 journalOff;             /* Current byte offset in the journal file */
          217  +  i64 journalHdr;             /* Byte offset to previous journal header */
          218  +  u32 sectorSize;             /* Assumed sector size during rollback */
   299    219   #ifdef SQLITE_TEST
   300    220     int nHit, nMiss;            /* Cache hits and missing */
   301    221     int nRead, nWrite;          /* Database pages read/written */
   302    222   #endif
   303    223     void (*xReiniter)(DbPage*); /* Call this routine when reloading pages */
   304    224   #ifdef SQLITE_HAS_CODEC
   305    225     void *(*xCodec)(void*,void*,Pgno,int); /* Routine for en/decoding data */
   306    226     void *pCodecArg;            /* First argument to xCodec() */
   307    227   #endif
   308    228     char *pTmpSpace;            /* Pager.pageSize bytes of space for tmp use */
          229  +  char dbFileVers[16];        /* Changes whenever database file changes */
   309    230     i64 journalSizeLimit;       /* Size limit for persistent journal files */
   310    231     PCache *pPCache;            /* Pointer to page cache object */
          232  +  PagerSavepoint *aSavepoint; /* Array of active savepoints */
          233  +  int nSavepoint;             /* Number of elements in aSavepoint[] */
   311    234   };
   312    235   
   313    236   /*
   314    237   ** The following global variables hold counters used for
   315    238   ** testing purposes only.  These variables do not exist in
   316    239   ** a non-testing build.  These variables are not thread-safe.
   317    240   */
................................................................................
   350    273   ** is different for every journal, we minimize that risk.
   351    274   */
   352    275   static const unsigned char aJournalMagic[] = {
   353    276     0xd9, 0xd5, 0x05, 0xf9, 0x20, 0xa1, 0x63, 0xd7,
   354    277   };
   355    278   
   356    279   /*
   357         -** The size of the of each page record in the journal is given by
   358         -** the following macro.
          280  +** The size of the header and of each page in the journal is determined
          281  +** by the following macros.
   359    282   */
   360    283   #define JOURNAL_PG_SZ(pPager)  ((pPager->pageSize) + 8)
   361    284   
   362    285   /*
   363         -** The journal header size for this pager. This is usually the same 
   364         -** size as a single disk sector. See also setSectorSize().
          286  +** The journal header size for this pager. In the future, this could be
          287  +** set to some value read from the disk controller. The important
          288  +** characteristic is that it is the same size as a disk sector.
   365    289   */
   366    290   #define JOURNAL_HDR_SZ(pPager) (pPager->sectorSize)
   367    291   
   368    292   /*
   369    293   ** The macro MEMDB is true if we are dealing with an in-memory database.
   370    294   ** We do this as a macro so that if the SQLITE_OMIT_MEMORYDB macro is set,
   371    295   ** the value of MEMDB will be a constant and the compiler will optimize
................................................................................
   373    297   */
   374    298   #ifdef SQLITE_OMIT_MEMORYDB
   375    299   # define MEMDB 0
   376    300   #else
   377    301   # define MEMDB pPager->memDb
   378    302   #endif
   379    303   
          304  +/*
          305  +** Page number PAGER_MJ_PGNO is never used in an SQLite database (it is
          306  +** reserved for working around a windows/posix incompatibility). It is
          307  +** used in the journal to signify that the remainder of the journal file 
          308  +** is devoted to storing a master journal name - there are no more pages to
          309  +** roll back. See comments for function writeMasterJournal() for details.
          310  +*/
          311  +/* #define PAGER_MJ_PGNO(x) (PENDING_BYTE/((x)->pageSize)) */
          312  +#define PAGER_MJ_PGNO(x) ((Pgno)((PENDING_BYTE/((x)->pageSize))+1))
          313  +
   380    314   /*
   381    315   ** The maximum legal page number is (2^31 - 1).
   382    316   */
   383    317   #define PAGER_MAX_PGNO 2147483647
   384    318   
   385    319   /*
   386    320   ** Return true if it is necessary to write page *pPg into the sub-journal.
................................................................................
   438    372   */
   439    373   static int write32bits(sqlite3_file *fd, i64 offset, u32 val){
   440    374     char ac[4];
   441    375     put32bits(ac, val);
   442    376     return sqlite3OsWrite(fd, ac, 4, offset);
   443    377   }
   444    378   
   445         -/*
   446         -** The argument to this macro is a file descriptor (type sqlite3_file*).
   447         -** Return 0 if it is not open, or non-zero (but not 1) if it is.
   448         -**
   449         -** This is so that expressions can be written as:
   450         -**
   451         -**   if( isOpen(pPager->jfd) ){ ...
   452         -**
   453         -** instead of
   454         -**
   455         -**   if( pPager->jfd->pMethods ){ ...
   456         -*/
   457         -#define isOpen(pFd) ((pFd)->pMethods)
   458         -
   459    379   /*
   460    380   ** If file pFd is open, call sqlite3OsUnlock() on it.
   461    381   */
   462    382   static int osUnlock(sqlite3_file *pFd, int eLock){
   463         -  if( !isOpen(pFd) ){
          383  +  if( !pFd->pMethods ){
   464    384       return SQLITE_OK;
   465    385     }
   466    386     return sqlite3OsUnlock(pFd, eLock);
   467    387   }
   468    388   
   469    389   /*
   470    390   ** This function determines whether or not the atomic-write optimization
................................................................................
   471    391   ** can be used with this pager. The optimization can be used if:
   472    392   **
   473    393   **  (a) the value returned by OsDeviceCharacteristics() indicates that
   474    394   **      a database page may be written atomically, and
   475    395   **  (b) the value returned by OsSectorSize() is less than or equal
   476    396   **      to the page size.
   477    397   **
   478         -** The optimization is also always enabled for temporary files. It is
   479         -** an error to call this function if pPager is opened on an in-memory
   480         -** database.
   481         -**
   482    398   ** If the optimization cannot be used, 0 is returned. If it can be used,
   483    399   ** then the value returned is the size of the journal file when it
   484    400   ** contains rollback data for exactly one page.
   485    401   */
   486    402   #ifdef SQLITE_ENABLE_ATOMIC_WRITE
   487    403   static int jrnlBufferSize(Pager *pPager){
   488         -  assert( !MEMDB );
   489         -  if( !pPager->tempFile ){
   490         -    int dc;                           /* Device characteristics */
   491         -    int nSector;                      /* Sector size */
   492         -    int szPage;                       /* Page size */
          404  +  int dc;           /* Device characteristics */
          405  +  int nSector;      /* Sector size */
          406  +  int szPage;        /* Page size */
          407  +  sqlite3_file *fd = pPager->fd;
   493    408   
   494         -    assert( isOpen(pPager->fd) );
   495         -    dc = sqlite3OsDeviceCharacteristics(pPager->fd);
          409  +  if( fd->pMethods ){
          410  +    dc = sqlite3OsDeviceCharacteristics(fd);
   496    411       nSector = pPager->sectorSize;
   497    412       szPage = pPager->pageSize;
   498         -
   499         -    assert(SQLITE_IOCAP_ATOMIC512==(512>>8));
   500         -    assert(SQLITE_IOCAP_ATOMIC64K==(65536>>8));
   501         -    if( 0==(dc&(SQLITE_IOCAP_ATOMIC|(szPage>>8)) || nSector>szPage) ){
   502         -      return 0;
   503         -    }
   504    413     }
   505    414   
   506         -  return JOURNAL_HDR_SZ(pPager) + JOURNAL_PG_SZ(pPager);
          415  +  assert(SQLITE_IOCAP_ATOMIC512==(512>>8));
          416  +  assert(SQLITE_IOCAP_ATOMIC64K==(65536>>8));
          417  +
          418  +  if( !fd->pMethods || 
          419  +       (dc & (SQLITE_IOCAP_ATOMIC|(szPage>>8)) && nSector<=szPage) ){
          420  +    return JOURNAL_HDR_SZ(pPager) + JOURNAL_PG_SZ(pPager);
          421  +  }
          422  +  return 0;
   507    423   }
   508    424   #endif
   509    425   
   510    426   /*
   511         -** This function should be called when an IOERR, CORRUPT or FULL error
   512         -** may have occured. The first argument is a pointer to the pager 
   513         -** structure, the second the error-code about to be returned by a pager 
   514         -** API function. The value returned is a copy of the second argument 
   515         -** to this function. 
          427  +** This function should be called when an error occurs within the pager
          428  +** code. The first argument is a pointer to the pager structure, the
          429  +** second the error-code about to be returned by a pager API function. 
          430  +** The value returned is a copy of the second argument to this function. 
   516    431   **
   517    432   ** If the second argument is SQLITE_IOERR, SQLITE_CORRUPT, or SQLITE_FULL
   518    433   ** the error becomes persistent. Until the persisten error is cleared,
   519    434   ** subsequent API calls on this Pager will immediately return the same 
   520    435   ** error code.
   521    436   **
   522    437   ** A persistent error indicates that the contents of the pager-cache 
   523    438   ** cannot be trusted. This state can be cleared by completely discarding 
   524    439   ** the contents of the pager-cache. If a transaction was active when
   525    440   ** the persistent error occured, then the rollback journal may need
   526         -** to be replayed to restore the contents of the database file (as if
   527         -** it were a hot-journal).
          441  +** to be replayed.
   528    442   */
   529    443   static void pager_unlock(Pager *pPager);
   530    444   static int pager_error(Pager *pPager, int rc){
   531    445     int rc2 = rc & 0xff;
   532    446     assert(
   533    447          pPager->errCode==SQLITE_FULL ||
   534    448          pPager->errCode==SQLITE_OK ||
................................................................................
   593    507   #define pager_datahash(X,Y)  0
   594    508   #define pager_pagehash(X)  0
   595    509   #define CHECK_PAGE(x)
   596    510   #endif  /* SQLITE_CHECK_PAGES */
   597    511   
   598    512   /*
   599    513   ** When this is called the journal file for pager pPager must be open.
   600         -** This function attempts to read a master journal file name from the 
   601         -** end of the file and, if successful, copies it into memory supplied 
   602         -** by the caller. See comments above writeMasterJournal() for the format
   603         -** used to store a master journal file name at the end of a journal file.
          514  +** The master journal file name is read from the end of the file and 
          515  +** written into memory supplied by the caller. 
   604    516   **
   605    517   ** zMaster must point to a buffer of at least nMaster bytes allocated by
   606    518   ** the caller. This should be sqlite3_vfs.mxPathname+1 (to ensure there is
   607    519   ** enough space to write the master journal name). If the master journal
   608    520   ** name in the journal is longer than nMaster bytes (including a
   609    521   ** nul-terminator), then this is handled as if no master journal name
   610    522   ** were present in the journal.
   611    523   **
   612         -** If a master journal file name is present at the end of the journal
   613         -** file, then it is copied into the buffer pointed to by zMaster. A
   614         -** nul-terminator byte is appended to the buffer following the master
   615         -** journal file name.
   616         -**
   617         -** If it is determined that no master journal file name is present 
   618         -** zMaster[0] is set to 0 and SQLITE_OK returned.
   619         -**
   620         -** If an error occurs while reading from the journal file, an SQLite
   621         -** error code is returned.
          524  +** If no master journal file name is present zMaster[0] is set to 0 and
          525  +** SQLITE_OK returned.
   622    526   */
   623    527   static int readMasterJournal(sqlite3_file *pJrnl, char *zMaster, u32 nMaster){
   624         -  int rc;                    /* Return code */
   625         -  u32 len;                   /* Length in bytes of master journal name */
   626         -  i64 szJ;                   /* Total size in bytes of journal file pJrnl */
   627         -  u32 cksum;                 /* MJ checksum value read from journal */
   628         -  u32 u;                     /* Unsigned loop counter */
   629         -  unsigned char aMagic[8];   /* A buffer to hold the magic header */
          528  +  int rc;
          529  +  u32 len;
          530  +  i64 szJ;
          531  +  u32 cksum;
          532  +  u32 u;                   /* Unsigned loop counter */
          533  +  unsigned char aMagic[8]; /* A buffer to hold the magic header */
          534  +
   630    535     zMaster[0] = '\0';
   631    536   
   632         -  if( SQLITE_OK!=(rc = sqlite3OsFileSize(pJrnl, &szJ))
   633         -   || szJ<16
   634         -   || SQLITE_OK!=(rc = read32bits(pJrnl, szJ-16, &len))
   635         -   || len>=nMaster 
   636         -   || SQLITE_OK!=(rc = read32bits(pJrnl, szJ-12, &cksum))
   637         -   || SQLITE_OK!=(rc = sqlite3OsRead(pJrnl, aMagic, 8, szJ-8))
   638         -   || memcmp(aMagic, aJournalMagic, 8)
   639         -   || SQLITE_OK!=(rc = sqlite3OsRead(pJrnl, zMaster, len, szJ-16-len))
   640         -  ){
          537  +  rc = sqlite3OsFileSize(pJrnl, &szJ);
          538  +  if( rc!=SQLITE_OK || szJ<16 ) return rc;
          539  +
          540  +  rc = read32bits(pJrnl, szJ-16, &len);
          541  +  if( rc!=SQLITE_OK ) return rc;
          542  +
          543  +  if( len>=nMaster ){
          544  +    return SQLITE_OK;
          545  +  }
          546  +
          547  +  rc = read32bits(pJrnl, szJ-12, &cksum);
          548  +  if( rc!=SQLITE_OK ) return rc;
          549  +
          550  +  rc = sqlite3OsRead(pJrnl, aMagic, 8, szJ-8);
          551  +  if( rc!=SQLITE_OK || memcmp(aMagic, aJournalMagic, 8) ) return rc;
          552  +
          553  +  rc = sqlite3OsRead(pJrnl, zMaster, len, szJ-16-len);
          554  +  if( rc!=SQLITE_OK ){
   641    555       return rc;
   642    556     }
          557  +  zMaster[len] = '\0';
   643    558   
   644    559     /* See if the checksum matches the master journal name */
   645    560     for(u=0; u<len; u++){
   646    561       cksum -= zMaster[u];
   647         -  }
          562  +   }
   648    563     if( cksum ){
   649    564       /* If the checksum doesn't add up, then one or more of the disk sectors
   650    565       ** containing the master journal filename is corrupted. This means
   651    566       ** definitely roll back, so just return SQLITE_OK and report a (nul)
   652    567       ** master-journal filename.
   653    568       */
   654         -    len = 0;
          569  +    zMaster[0] = '\0';
   655    570     }
   656         -  zMaster[len] = '\0';
   657    571      
   658    572     return SQLITE_OK;
   659    573   }
   660    574   
   661    575   /*
   662         -** Return the offset of the sector boundary at or immediately 
   663         -** following the value in pPager->journalOff, assuming a sector 
   664         -** size of pPager->sectorSize bytes.
          576  +** Seek the journal file descriptor to the next sector boundary where a
          577  +** journal header may be read or written. Pager.journalOff is updated with
          578  +** the new seek offset.
   665    579   **
   666    580   ** i.e for a sector size of 512:
   667    581   **
   668         -**   Pager.journalOff          Return value
   669         -**   ---------------------------------------
   670         -**   0                         0
   671         -**   512                       512
   672         -**   100                       512
   673         -**   2000                      2048
          582  +** Input Offset              Output Offset
          583  +** ---------------------------------------
          584  +** 0                         0
          585  +** 512                       512
          586  +** 100                       512
          587  +** 2000                      2048
   674    588   ** 
   675    589   */
   676    590   static i64 journalHdrOffset(Pager *pPager){
   677    591     i64 offset = 0;
   678    592     i64 c = pPager->journalOff;
   679    593     if( c ){
   680    594       offset = ((c-1)/JOURNAL_HDR_SZ(pPager) + 1) * JOURNAL_HDR_SZ(pPager);
   681    595     }
   682    596     assert( offset%JOURNAL_HDR_SZ(pPager)==0 );
   683    597     assert( offset>=c );
   684    598     assert( (offset-c)<JOURNAL_HDR_SZ(pPager) );
   685    599     return offset;
   686    600   }
          601  +static void seekJournalHdr(Pager *pPager){
          602  +  pPager->journalOff = journalHdrOffset(pPager);
          603  +}
   687    604   
   688    605   /*
   689         -** The journal file must be open when this function is called.
   690         -**
   691         -** This function is a no-op if the journal file has not been written to
   692         -** within the current transaction (i.e. if Pager.journalOff==0).
   693         -**
   694         -** If doTruncate is non-zero or the Pager.journalSizeLimit variable is
   695         -** set to 0, then truncate the journal file to zero bytes in size. Otherwise,
   696         -** zero the 28-byte header at the start of the journal file. In either case, 
   697         -** if the pager is not in no-sync mode, sync the journal file immediately 
   698         -** after writing or truncating it.
   699         -**
   700         -** If Pager.journalSizeLimit is set to a positive, non-zero value, and
   701         -** following the truncation or zeroing described above the size of the 
   702         -** journal file in bytes is larger than this value, then truncate the
   703         -** journal file to Pager.journalSizeLimit bytes. The journal file does
   704         -** not need to be synced following this operation.
   705         -**
   706         -** If an IO error occurs, abandon processing and return the IO error code.
   707         -** Otherwise, return SQLITE_OK.
          606  +** Write zeros over the header of the journal file.  This has the
          607  +** effect of invalidating the journal file and committing the
          608  +** transaction.
   708    609   */
   709    610   static int zeroJournalHdr(Pager *pPager, int doTruncate){
   710         -  int rc = SQLITE_OK;                               /* Return code */
   711         -  assert( isOpen(pPager->jfd) );
          611  +  int rc = SQLITE_OK;
          612  +  static const char zeroHdr[28] = {0};
          613  +
   712    614     if( pPager->journalOff ){
   713         -    const i64 iLimit = pPager->journalSizeLimit;    /* Local cache of jsl */
          615  +    i64 iLimit = pPager->journalSizeLimit;
   714    616   
   715    617       IOTRACE(("JZEROHDR %p\n", pPager))
   716    618       if( doTruncate || iLimit==0 ){
   717    619         rc = sqlite3OsTruncate(pPager->jfd, 0);
   718    620       }else{
   719         -      static const char zeroHdr[28] = {0};
   720    621         rc = sqlite3OsWrite(pPager->jfd, zeroHdr, sizeof(zeroHdr), 0);
   721    622       }
   722    623       if( rc==SQLITE_OK && !pPager->noSync ){
   723    624         rc = sqlite3OsSync(pPager->jfd, SQLITE_SYNC_DATAONLY|pPager->sync_flags);
   724    625       }
   725    626   
   726    627       /* At this point the transaction is committed but the write lock 
................................................................................
   752    653   ** - 4 bytes: Initial database page count.
   753    654   ** - 4 bytes: Sector size used by the process that wrote this journal.
   754    655   ** - 4 bytes: Database page size.
   755    656   ** 
   756    657   ** Followed by (JOURNAL_HDR_SZ - 28) bytes of unused space.
   757    658   */
   758    659   static int writeJournalHdr(Pager *pPager){
   759         -  int rc = SQLITE_OK;                 /* Return code */
   760         -  char *zHeader = pPager->pTmpSpace;  /* Temporary space used to build header */
   761         -  u32 nHeader = pPager->pageSize;     /* Size of buffer pointed to by zHeader */
   762         -  u32 nWrite;                         /* Bytes of header sector written */
   763         -  int ii;                             /* Loop counter */
   764         -
   765         -  assert( isOpen(pPager->jfd) );      /* Journal file must be open. */
          660  +  int rc = SQLITE_OK;
          661  +  char *zHeader = pPager->pTmpSpace;
          662  +  u32 nHeader = pPager->pageSize;
          663  +  u32 nWrite;
          664  +  int ii;
   766    665   
   767    666     if( nHeader>JOURNAL_HDR_SZ(pPager) ){
   768    667       nHeader = JOURNAL_HDR_SZ(pPager);
   769    668     }
   770    669   
   771         -  /* If there are active savepoints and any of them were created 
   772         -  ** since the most recent journal header was written, update the 
   773         -  ** PagerSavepoint.iHdrOffset fields now.
          670  +  /* If there are active savepoints and any of them were created since the
          671  +  ** most recent journal header was written, update the PagerSavepoint.iHdrOff
          672  +  ** fields now.
   774    673     */
   775    674     for(ii=0; ii<pPager->nSavepoint; ii++){
   776    675       if( pPager->aSavepoint[ii].iHdrOffset==0 ){
   777    676         pPager->aSavepoint[ii].iHdrOffset = pPager->journalOff;
   778    677       }
   779    678     }
   780    679   
   781         -  pPager->journalHdr = pPager->journalOff = journalHdrOffset(pPager);
          680  +  seekJournalHdr(pPager);
          681  +  pPager->journalHdr = pPager->journalOff;
          682  +
   782    683     memcpy(zHeader, aJournalMagic, sizeof(aJournalMagic));
   783    684   
   784    685     /* 
   785    686     ** Write the nRec Field - the number of page records that follow this
   786    687     ** journal header. Normally, zero is written to this value at this time.
   787    688     ** After the records are added to the journal (and the journal synced, 
   788    689     ** if in full-sync mode), the zero is overwritten with the true number
................................................................................
   797    698     **
   798    699     **   * When the pager is in no-sync mode. Corruption can follow a
   799    700     **     power failure in this case anyway.
   800    701     **
   801    702     **   * When the SQLITE_IOCAP_SAFE_APPEND flag is set. This guarantees
   802    703     **     that garbage data is never appended to the journal file.
   803    704     */
   804         -  assert( isOpen(pPager->fd) || pPager->noSync );
          705  +  assert(pPager->fd->pMethods||pPager->noSync);
   805    706     if( (pPager->noSync) || (pPager->journalMode==PAGER_JOURNALMODE_MEMORY)
   806    707      || (sqlite3OsDeviceCharacteristics(pPager->fd)&SQLITE_IOCAP_SAFE_APPEND) 
   807    708     ){
   808    709       put32bits(&zHeader[sizeof(aJournalMagic)], 0xffffffff);
   809    710     }else{
   810    711       put32bits(&zHeader[sizeof(aJournalMagic)], 0);
   811    712     }
................................................................................
   814    715     sqlite3_randomness(sizeof(pPager->cksumInit), &pPager->cksumInit);
   815    716     put32bits(&zHeader[sizeof(aJournalMagic)+4], pPager->cksumInit);
   816    717     /* The initial database size */
   817    718     put32bits(&zHeader[sizeof(aJournalMagic)+8], pPager->dbOrigSize);
   818    719     /* The assumed sector size for this process */
   819    720     put32bits(&zHeader[sizeof(aJournalMagic)+12], pPager->sectorSize);
   820    721   
   821         -  /* The page size */
   822         -  put32bits(&zHeader[sizeof(aJournalMagic)+16], pPager->pageSize);
   823         -
   824    722     /* Initializing the tail of the buffer is not necessary.  Everything
   825    723     ** works find if the following memset() is omitted.  But initializing
   826    724     ** the memory prevents valgrind from complaining, so we are willing to
   827    725     ** take the performance hit.
   828    726     */
   829         -  memset(&zHeader[sizeof(aJournalMagic)+20], 0,
   830         -         nHeader-(sizeof(aJournalMagic)+20));
          727  +  memset(&zHeader[sizeof(aJournalMagic)+16], 0,
          728  +         nHeader-(sizeof(aJournalMagic)+16));
   831    729   
   832         -  /* In theory, it is only necessary to write the 28 bytes that the 
   833         -  ** journal header consumes to the journal file here. Then increment the 
   834         -  ** Pager.journalOff variable by JOURNAL_HDR_SZ so that the next 
   835         -  ** record is written to the following sector (leaving a gap in the file
   836         -  ** that will be implicitly filled in by the OS).
   837         -  **
   838         -  ** However it has been discovered that on some systems this pattern can 
   839         -  ** be significantly slower than contiguously writing data to the file,
   840         -  ** even if that means explicitly writing data to the block of 
   841         -  ** (JOURNAL_HDR_SZ - 28) bytes that will not be used. So that is what
   842         -  ** is done. 
   843         -  **
   844         -  ** The loop is required here in case the sector-size is larger than the 
   845         -  ** database page size. Since the zHeader buffer is only Pager.pageSize
   846         -  ** bytes in size, more than one call to sqlite3OsWrite() may be required
   847         -  ** to populate the entire journal header sector.
   848         -  */ 
          730  +  if( pPager->journalHdr==0 ){
          731  +    /* The page size */
          732  +    put32bits(&zHeader[sizeof(aJournalMagic)+16], pPager->pageSize);
          733  +  }
          734  +
   849    735     for(nWrite=0; rc==SQLITE_OK&&nWrite<JOURNAL_HDR_SZ(pPager); nWrite+=nHeader){
   850    736       IOTRACE(("JHDR %p %lld %d\n", pPager, pPager->journalHdr, nHeader))
   851    737       rc = sqlite3OsWrite(pPager->jfd, zHeader, nHeader, pPager->journalOff);
   852    738       pPager->journalOff += nHeader;
   853    739     }
   854    740   
   855    741     return rc;
   856    742   }
   857    743   
   858    744   /*
   859    745   ** The journal file must be open when this is called. A journal header file
   860    746   ** (JOURNAL_HDR_SZ bytes) is read from the current location in the journal
   861    747   ** file. The current location in the journal file is given by
   862         -** pPager->journalOff. See comments above function writeJournalHdr() for
          748  +** pPager->journalOff.  See comments above function writeJournalHdr() for
   863    749   ** a description of the journal header format.
   864    750   **
   865         -** If the header is read successfully, *pNRec is set to the number of
   866         -** page records following this header and *pDbSize is set to the size of the
          751  +** If the header is read successfully, *nRec is set to the number of
          752  +** page records following this header and *dbSize is set to the size of the
   867    753   ** database before the transaction began, in pages. Also, pPager->cksumInit
   868    754   ** is set to the value read from the journal header. SQLITE_OK is returned
   869    755   ** in this case.
   870    756   **
   871    757   ** If the journal header file appears to be corrupted, SQLITE_DONE is
   872         -** returned and *pNRec and *PDbSize are undefined.  If JOURNAL_HDR_SZ bytes
          758  +** returned and *nRec and *dbSize are undefined.  If JOURNAL_HDR_SZ bytes
   873    759   ** cannot be read from the journal file an error code is returned.
   874    760   */
   875    761   static int readJournalHdr(
   876         -  Pager *pPager,               /* Pager object */
   877         -  i64 journalSize,             /* Size of the open journal file in bytes */
   878         -  u32 *pNRec,                  /* OUT: Value read from the nRec field */
   879         -  u32 *pDbSize                 /* OUT: Value of original database size field */
          762  +  Pager *pPager, 
          763  +  i64 journalSize,
          764  +  u32 *pNRec, 
          765  +  u32 *pDbSize
   880    766   ){
   881         -  int rc;                      /* Return code */
   882         -  unsigned char aMagic[8];     /* A buffer to hold the magic header */
   883         -  i64 iHdrOff;                 /* Offset of journal header being read */
          767  +  int rc;
          768  +  unsigned char aMagic[8]; /* A buffer to hold the magic header */
          769  +  i64 jrnlOff;
          770  +  u32 iPageSize;
          771  +  u32 iSectorSize;
   884    772   
   885         -  assert( isOpen(pPager->jfd) );      /* Journal file must be open. */
   886         -
   887         -  /* Advance Pager.journalOff to the start of the next sector. If the
   888         -  ** journal file is too small for there to be a header stored at this
   889         -  ** point, return SQLITE_DONE.
   890         -  */
   891         -  pPager->journalOff = journalHdrOffset(pPager);
          773  +  seekJournalHdr(pPager);
   892    774     if( pPager->journalOff+JOURNAL_HDR_SZ(pPager) > journalSize ){
   893    775       return SQLITE_DONE;
   894    776     }
   895         -  iHdrOff = pPager->journalOff;
          777  +  jrnlOff = pPager->journalOff;
   896    778   
   897         -  /* Read in the first 8 bytes of the journal header. If they do not match
   898         -  ** the  magic string found at the start of each journal header, return
   899         -  ** SQLITE_DONE. If an IO error occurs, return an error code. Otherwise,
   900         -  ** proceed.
   901         -  */
   902         -  rc = sqlite3OsRead(pPager->jfd, aMagic, sizeof(aMagic), iHdrOff);
   903         -  if( rc ){
   904         -    return rc;
   905         -  }
          779  +  rc = sqlite3OsRead(pPager->jfd, aMagic, sizeof(aMagic), jrnlOff);
          780  +  if( rc ) return rc;
          781  +  jrnlOff += sizeof(aMagic);
          782  +
   906    783     if( memcmp(aMagic, aJournalMagic, sizeof(aMagic))!=0 ){
   907    784       return SQLITE_DONE;
   908    785     }
   909    786   
   910         -  /* Read the first three 32-bit fields of the journal header: The nRec
   911         -  ** field, the checksum-initializer and the database size at the start
   912         -  ** of the transaction. Return an error code if anything goes wrong.
   913         -  */
   914         -  if( SQLITE_OK!=(rc = read32bits(pPager->jfd, iHdrOff+8, pNRec))
   915         -   || SQLITE_OK!=(rc = read32bits(pPager->jfd, iHdrOff+12, &pPager->cksumInit))
   916         -   || SQLITE_OK!=(rc = read32bits(pPager->jfd, iHdrOff+16, pDbSize))
   917         -  ){
   918         -    return rc;
   919         -  }
          787  +  rc = read32bits(pPager->jfd, jrnlOff, pNRec);
          788  +  if( rc ) return rc;
          789  +
          790  +  rc = read32bits(pPager->jfd, jrnlOff+4, &pPager->cksumInit);
          791  +  if( rc ) return rc;
          792  +
          793  +  rc = read32bits(pPager->jfd, jrnlOff+8, pDbSize);
          794  +  if( rc ) return rc;
   920    795   
   921    796     if( pPager->journalOff==0 ){
   922         -    u32 iPageSize;               /* Page-size field of journal header */
   923         -    u32 iSectorSize;             /* Sector-size field of journal header */
   924         -    u16 iPageSize16;             /* Copy of iPageSize in 16-bit variable */
          797  +    rc = read32bits(pPager->jfd, jrnlOff+16, &iPageSize);
          798  +    if( rc ) return rc;
   925    799   
   926         -    /* Read the page-size and sector-size journal header fields. */
   927         -    if( SQLITE_OK!=(rc = read32bits(pPager->jfd, iHdrOff+20, &iSectorSize))
   928         -     || SQLITE_OK!=(rc = read32bits(pPager->jfd, iHdrOff+24, &iPageSize))
          800  +    if( iPageSize<512 
          801  +     || iPageSize>SQLITE_MAX_PAGE_SIZE 
          802  +     || ((iPageSize-1)&iPageSize)!=0 
   929    803       ){
   930         -      return rc;
          804  +      /* If the page-size in the journal-header is invalid, then the process
          805  +      ** that wrote the journal-header must have crashed before the header
          806  +      ** was synced. In this case stop reading the journal file here.
          807  +      */
          808  +      rc = SQLITE_DONE;
          809  +    }else{
          810  +      u16 pagesize = (u16)iPageSize;
          811  +      rc = sqlite3PagerSetPagesize(pPager, &pagesize);
          812  +      assert( rc!=SQLITE_OK || pagesize==(u16)iPageSize );
   931    813       }
   932         -
   933         -    /* Check that the values read from the page-size and sector-size fields
   934         -    ** are within range. To be 'in range', both values need to be a power
   935         -    ** of two greater than or equal to 512, and not greater than their 
   936         -    ** respective compile time maximum limits.
   937         -    */
   938         -    if( iPageSize<512                  || iSectorSize<512
   939         -     || iPageSize>SQLITE_MAX_PAGE_SIZE || iSectorSize>MAX_SECTOR_SIZE
   940         -     || ((iPageSize-1)&iPageSize)!=0   || ((iSectorSize-1)&iSectorSize)!=0 
   941         -    ){
   942         -      /* If the either the page-size or sector-size in the journal-header is 
   943         -      ** invalid, then the process that wrote the journal-header must have 
   944         -      ** crashed before the header was synced. In this case stop reading 
   945         -      ** the journal file here.
   946         -      */
   947         -      return SQLITE_DONE;
   948         -    }
   949         -
   950         -    /* Update the page-size to match the value read from the journal. 
   951         -    ** Use a testcase() macro to make sure that malloc failure within 
   952         -    ** PagerSetPagesize() is tested.
   953         -    */
   954         -    iPageSize16 = (u16)iPageSize;
   955         -    rc = sqlite3PagerSetPagesize(pPager, &iPageSize16);
   956         -    testcase( rc!=SQLITE_OK );
   957         -    assert( rc!=SQLITE_OK || iPageSize16==(u16)iPageSize );
   958         -
          814  +    if( rc ) return rc;
          815  +  
   959    816       /* Update the assumed sector-size to match the value used by 
   960    817       ** the process that created this journal. If this journal was
   961    818       ** created by a process other than this one, then this routine
   962    819       ** is being called from within pager_playback(). The local value
   963    820       ** of Pager.sectorSize is restored at the end of that routine.
   964    821       */
          822  +    rc = read32bits(pPager->jfd, jrnlOff+12, &iSectorSize);
          823  +    if( rc ) return rc;
          824  +    if( (iSectorSize&(iSectorSize-1))
          825  +      || iSectorSize<512
          826  +      || iSectorSize>MAX_SECTOR_SIZE
          827  +    ){
          828  +      return SQLITE_DONE;
          829  +    }
   965    830       pPager->sectorSize = iSectorSize;
   966    831     }
   967    832   
   968    833     pPager->journalOff += JOURNAL_HDR_SZ(pPager);
   969         -  return rc;
          834  +  return SQLITE_OK;
   970    835   }
   971    836   
   972    837   
   973    838   /*
   974    839   ** Write the supplied master journal name into the journal file for pager
   975    840   ** pPager at the current location. The master journal name must be the last
   976    841   ** thing written to a journal file. If the pager is in full-sync mode, the
   977    842   ** journal file descriptor is advanced to the next sector boundary before
   978    843   ** anything is written. The format is:
   979    844   **
   980         -**   + 4 bytes: PAGER_MJ_PGNO.
   981         -**   + N bytes: Master journal filename in utf-8.
   982         -**   + 4 bytes: N (length of master journal name in bytes, no nul-terminator).
   983         -**   + 4 bytes: Master journal name checksum.
   984         -**   + 8 bytes: aJournalMagic[].
          845  +** + 4 bytes: PAGER_MJ_PGNO.
          846  +** + N bytes: length of master journal name.
          847  +** + 4 bytes: N
          848  +** + 4 bytes: Master journal name checksum.
          849  +** + 8 bytes: aJournalMagic[].
   985    850   **
   986    851   ** The master journal page checksum is the sum of the bytes in the master
   987         -** journal name, where each byte is interpreted as a signed 8-bit integer.
          852  +** journal name.
   988    853   **
   989    854   ** If zMaster is a NULL pointer (occurs for a single database transaction), 
   990    855   ** this call is a no-op.
   991    856   */
   992    857   static int writeMasterJournal(Pager *pPager, const char *zMaster){
   993         -  int rc;                          /* Return code */
   994         -  int nMaster;                     /* Length of string zMaster */
   995         -  i64 iHdrOff;                     /* Offset of header in journal file */
   996         -  i64 jrnlSize;                    /* Size of journal file on disk */
   997         -  u32 cksum = 0;                   /* Checksum of string zMaster */
          858  +  int rc;
          859  +  int len; 
          860  +  int i; 
          861  +  i64 jrnlOff;
          862  +  i64 jrnlSize;
          863  +  u32 cksum = 0;
          864  +  char zBuf[sizeof(aJournalMagic)+2*4];
   998    865   
   999         -  assert( isOpen(pPager->jfd) );
  1000         -  assert( !pPager->setMaster );
  1001         -
  1002         -  if( !zMaster ) return SQLITE_OK;
          866  +  if( !zMaster || pPager->setMaster ) return SQLITE_OK;
  1003    867     if( pPager->journalMode==PAGER_JOURNALMODE_MEMORY ) return SQLITE_OK;
  1004    868     pPager->setMaster = 1;
  1005    869   
  1006         -  /* Calculate the length in bytes and the checksum of zMaster */
  1007         -  for(nMaster=0; zMaster[nMaster]; nMaster++){
  1008         -    cksum += zMaster[nMaster];
          870  +  len = sqlite3Strlen30(zMaster);
          871  +  for(i=0; i<len; i++){
          872  +    cksum += zMaster[i];
  1009    873     }
  1010    874   
  1011    875     /* If in full-sync mode, advance to the next disk sector before writing
  1012    876     ** the master journal name. This is in case the previous page written to
  1013    877     ** the journal has already been synced.
  1014    878     */
  1015    879     if( pPager->fullSync ){
  1016         -    pPager->journalOff = journalHdrOffset(pPager);
          880  +    seekJournalHdr(pPager);
  1017    881     }
  1018         -  iHdrOff = pPager->journalOff;
          882  +  jrnlOff = pPager->journalOff;
          883  +  pPager->journalOff += (len+20);
  1019    884   
  1020         -  /* Write the master journal data to the end of the journal file. If
  1021         -  ** an error occurs, return the error code to the caller.
  1022         -  */
  1023         -  if( (rc = write32bits(pPager->jfd, iHdrOff, PAGER_MJ_PGNO(pPager)))
  1024         -   || (rc = sqlite3OsWrite(pPager->jfd, zMaster, nMaster, iHdrOff+4))
  1025         -   || (rc = write32bits(pPager->jfd, iHdrOff+4+nMaster, nMaster))
  1026         -   || (rc = write32bits(pPager->jfd, iHdrOff+4+nMaster+4, cksum))
  1027         -   || (rc = sqlite3OsWrite(pPager->jfd, aJournalMagic, 8, iHdrOff+4+nMaster+8))
  1028         -  ){
  1029         -    return rc;
  1030         -  }
  1031         -  pPager->journalOff += (nMaster+20);
          885  +  rc = write32bits(pPager->jfd, jrnlOff, PAGER_MJ_PGNO(pPager));
          886  +  if( rc!=SQLITE_OK ) return rc;
          887  +  jrnlOff += 4;
          888  +
          889  +  rc = sqlite3OsWrite(pPager->jfd, zMaster, len, jrnlOff);
          890  +  if( rc!=SQLITE_OK ) return rc;
          891  +  jrnlOff += len;
          892  +
          893  +  put32bits(zBuf, len);
          894  +  put32bits(&zBuf[4], cksum);
          895  +  memcpy(&zBuf[8], aJournalMagic, sizeof(aJournalMagic));
          896  +  rc = sqlite3OsWrite(pPager->jfd, zBuf, 8+sizeof(aJournalMagic), jrnlOff);
          897  +  jrnlOff += 8+sizeof(aJournalMagic);
  1032    898     pPager->needSync = !pPager->noSync;
  1033    899   
  1034    900     /* If the pager is in peristent-journal mode, then the physical 
  1035    901     ** journal-file may extend past the end of the master-journal name
  1036    902     ** and 8 bytes of magic data just written to the file. This is 
  1037    903     ** dangerous because the code to rollback a hot-journal file
  1038    904     ** will not be able to find the master-journal name to determine 
  1039    905     ** whether or not the journal is hot. 
  1040    906     **
  1041    907     ** Easiest thing to do in this scenario is to truncate the journal 
  1042    908     ** file to the required size.
  1043    909     */ 
  1044         -  if( SQLITE_OK==(rc = sqlite3OsFileSize(pPager->jfd, &jrnlSize))
  1045         -   && jrnlSize>pPager->journalOff
          910  +  if( (rc==SQLITE_OK)
          911  +   && (rc = sqlite3OsFileSize(pPager->jfd, &jrnlSize))==SQLITE_OK
          912  +   && jrnlSize>jrnlOff
  1046    913     ){
  1047         -    rc = sqlite3OsTruncate(pPager->jfd, pPager->journalOff);
          914  +    rc = sqlite3OsTruncate(pPager->jfd, jrnlOff);
  1048    915     }
  1049    916     return rc;
  1050    917   }
  1051    918   
  1052    919   /*
  1053         -** Find a page in the hash table given its page number. Return
  1054         -** a pointer to the page or NULL if the requested page is not 
  1055         -** already in memory.
          920  +** Find a page in the hash table given its page number.  Return
          921  +** a pointer to the page or NULL if not found.
  1056    922   */
  1057    923   static PgHdr *pager_lookup(Pager *pPager, Pgno pgno){
  1058         -  PgHdr *p;                         /* Return value */
  1059         -
  1060         -  /* It is not possible for a call to PcacheFetch() with createFlag==0 to
  1061         -  ** fail, since no attempt to allocate dynamic memory will be made.
  1062         -  */
  1063         -  (void)sqlite3PcacheFetch(pPager->pPCache, pgno, 0, &p);
          924  +  PgHdr *p;
          925  +  sqlite3PcacheFetch(pPager->pPCache, pgno, 0, &p);
  1064    926     return p;
  1065    927   }
  1066    928   
  1067    929   /*
  1068         -** Unless the pager is in error-state, discard all in-memory pages. If
  1069         -** the pager is in error-state, then this call is a no-op.
          930  +** Clear the in-memory cache.  This routine
          931  +** sets the state of the pager back to what it was when it was first
          932  +** opened.  Any outstanding pages are invalidated and subsequent attempts
          933  +** to access those pages will likely result in a coredump.
  1070    934   */
  1071    935   static void pager_reset(Pager *pPager){
  1072         -  if( SQLITE_OK==pPager->errCode ){
  1073         -    sqlite3PcacheClear(pPager->pPCache);
  1074         -  }
          936  +  if( pPager->errCode ) return;
          937  +  sqlite3PcacheClear(pPager->pPCache);
  1075    938   }
  1076    939   
  1077    940   /*
  1078    941   ** Free all structures in the Pager.aSavepoint[] array and set both
  1079    942   ** Pager.aSavepoint and Pager.nSavepoint to zero. Close the sub-journal
  1080    943   ** if it is open and the pager is not in exclusive mode.
  1081    944   */
  1082         -static void releaseAllSavepoints(Pager *pPager){
  1083         -  int ii;               /* Iterator for looping through Pager.aSavepoint */
          945  +static void releaseAllSavepoint(Pager *pPager){
          946  +  int ii;
  1084    947     for(ii=0; ii<pPager->nSavepoint; ii++){
  1085    948       sqlite3BitvecDestroy(pPager->aSavepoint[ii].pInSavepoint);
  1086    949     }
  1087    950     if( !pPager->exclusiveMode ){
  1088    951       sqlite3OsClose(pPager->sjfd);
  1089    952     }
  1090    953     sqlite3_free(pPager->aSavepoint);
  1091    954     pPager->aSavepoint = 0;
  1092    955     pPager->nSavepoint = 0;
  1093         -  pPager->nSubRec = 0;
          956  +  pPager->stmtNRec = 0;
  1094    957   }
  1095    958   
  1096    959   /*
  1097         -** Set the bit number pgno in the PagerSavepoint.pInSavepoint 
  1098         -** bitvecs of all open savepoints. Return SQLITE_OK if successful
  1099         -** or SQLITE_NOMEM if a malloc failure occurs.
          960  +** Set the bit number pgno in the PagerSavepoint.pInSavepoint bitvecs of
          961  +** all open savepoints.
  1100    962   */
  1101    963   static int addToSavepointBitvecs(Pager *pPager, Pgno pgno){
  1102    964     int ii;                   /* Loop counter */
  1103    965     int rc = SQLITE_OK;       /* Result code */
  1104    966   
  1105    967     for(ii=0; ii<pPager->nSavepoint; ii++){
  1106    968       PagerSavepoint *p = &pPager->aSavepoint[ii];
  1107    969       if( pgno<=p->nOrig ){
  1108    970         rc |= sqlite3BitvecSet(p->pInSavepoint, pgno);
  1109         -      testcase( rc==SQLITE_NOMEM );
  1110    971         assert( rc==SQLITE_OK || rc==SQLITE_NOMEM );
  1111    972       }
  1112    973     }
  1113    974     return rc;
  1114    975   }
  1115    976   
  1116    977   /*
  1117         -** Unlock the database file. This function is a no-op if the pager
  1118         -** is in exclusive mode.
          978  +** Unlock the database file. 
  1119    979   **
  1120    980   ** If the pager is currently in error state, discard the contents of 
  1121    981   ** the cache and reset the Pager structure internal state. If there is
  1122    982   ** an open journal-file, then the next time a shared-lock is obtained
  1123    983   ** on the pager file (by this or any other process), it will be
  1124    984   ** treated as a hot-journal and rolled back.
  1125    985   */
  1126    986   static void pager_unlock(Pager *pPager){
  1127    987     if( !pPager->exclusiveMode ){
  1128         -    int rc;                      /* Return code */
          988  +    int rc;
  1129    989   
  1130    990       /* Always close the journal file when dropping the database lock.
  1131    991       ** Otherwise, another connection with journal_mode=delete might
  1132    992       ** delete the file out from under us.
  1133    993       */
  1134         -    sqlite3OsClose(pPager->jfd);
  1135         -    sqlite3BitvecDestroy(pPager->pInJournal);
  1136         -    pPager->pInJournal = 0;
  1137         -    releaseAllSavepoints(pPager);
  1138         -
  1139         -    /* If the file is unlocked, somebody else might change it. The
  1140         -    ** values stored in Pager.dbSize etc. might become invalid if
  1141         -    ** this happens. TODO: Really, this doesn't need to be cleared
  1142         -    ** until the change-counter check fails in pagerSharedLock().
  1143         -    */
  1144         -    pPager->dbSizeValid = 0;
          994  +    if( pPager->journalOpen ){
          995  +      sqlite3OsClose(pPager->jfd);
          996  +      pPager->journalOpen = 0;
          997  +      sqlite3BitvecDestroy(pPager->pInJournal);
          998  +      pPager->pInJournal = 0;
          999  +      sqlite3BitvecDestroy(pPager->pAlwaysRollback);
         1000  +      pPager->pAlwaysRollback = 0;
         1001  +    }
  1145   1002   
  1146   1003       rc = osUnlock(pPager->fd, NO_LOCK);
  1147         -    if( rc ){
  1148         -      pPager->errCode = rc;
  1149         -    }
         1004  +    if( rc ) pPager->errCode = rc;
         1005  +    pPager->dbSizeValid = 0;
  1150   1006       IOTRACE(("UNLOCK %p\n", pPager))
  1151   1007   
  1152   1008       /* If Pager.errCode is set, the contents of the pager cache cannot be
  1153   1009       ** trusted. Now that the pager file is unlocked, the contents of the
  1154   1010       ** cache can be discarded and the error code safely cleared.
  1155   1011       */
  1156   1012       if( pPager->errCode ){
  1157         -      if( rc==SQLITE_OK ){
  1158         -        pPager->errCode = SQLITE_OK;
  1159         -      }
         1013  +      if( rc==SQLITE_OK ) pPager->errCode = SQLITE_OK;
  1160   1014         pager_reset(pPager);
         1015  +      releaseAllSavepoint(pPager);
         1016  +      pPager->journalOff = 0;
         1017  +      pPager->journalStarted = 0;
         1018  +      pPager->dbOrigSize = 0;
  1161   1019       }
  1162   1020   
  1163         -    pPager->changeCountDone = 0;
  1164   1021       pPager->state = PAGER_UNLOCK;
         1022  +    pPager->changeCountDone = 0;
  1165   1023     }
  1166   1024   }
  1167   1025   
  1168   1026   /*
  1169   1027   ** Execute a rollback if a transaction is active and unlock the 
  1170         -** database file. 
  1171         -**
  1172         -** If the pager has already entered the error state, do not attempt 
  1173         -** the rollback at this time. Instead, pager_unlock() is called. The
  1174         -** call to pager_unlock() will discard all in-memory pages, unlock
  1175         -** the database file and clear the error state. If this means that
  1176         -** there is a hot-journal left in the file-system, the next connection
  1177         -** to obtain a shared lock on the pager (which may be this one) will
  1178         -** roll it back.
  1179         -**
  1180         -** If the pager has not already entered the error state, but an IO or
  1181         -** malloc error occurs during a rollback, then this will itself cause 
  1182         -** the pager to enter the error state. Which will be cleared by the
  1183         -** call to pager_unlock(), as described above.
         1028  +** database file. If the pager has already entered the error state, 
         1029  +** do not attempt the rollback.
  1184   1030   */
  1185         -static void pagerUnlockAndRollback(Pager *pPager){
  1186         -  if( pPager->errCode==SQLITE_OK && pPager->state>=PAGER_RESERVED ){
         1031  +static void pagerUnlockAndRollback(Pager *p){
         1032  +  if( p->errCode==SQLITE_OK && p->state>=PAGER_RESERVED ){
  1187   1033       sqlite3BeginBenignMalloc();
  1188         -    sqlite3PagerRollback(pPager);
         1034  +    sqlite3PagerRollback(p);
  1189   1035       sqlite3EndBenignMalloc();
  1190   1036     }
  1191         -  pager_unlock(pPager);
  1192         -}
  1193         -
  1194         -/*
  1195         -** This routine ends a transaction. A transaction is usually ended by 
  1196         -** either a COMMIT or a ROLLBACK operation. This routine may be called 
  1197         -** after rollback of a hot-journal, or if an error occurs while opening
  1198         -** the journal file or writing the very first journal-header of a
  1199         -** database transaction.
  1200         -** 
  1201         -** If the pager is in PAGER_SHARED or PAGER_UNLOCK state when this
  1202         -** routine is called, it is a no-op (returns SQLITE_OK).
  1203         -**
  1204         -** Otherwise, any active savepoints are released.
  1205         -**
  1206         -** If the journal file is open, then it is "finalized". Once a journal 
  1207         -** file has been finalized it is not possible to use it to roll back a 
  1208         -** transaction. Nor will it be considered to be a hot-journal by this
  1209         -** or any other database connection. Exactly how a journal is finalized
  1210         -** depends on whether or not the pager is running in exclusive mode and
  1211         -** the current journal-mode (Pager.journalMode value), as follows:
  1212         -**
  1213         -**   journalMode==MEMORY
  1214         -**     Journal file descriptor is simply closed. This destroys an 
  1215         -**     in-memory journal.
  1216         -**
  1217         -**   journalMode==TRUNCATE
  1218         -**     Journal file is truncated to zero bytes in size.
  1219         -**
  1220         -**   journalMode==PERSIST
  1221         -**     The first 28 bytes of the journal file are zeroed. This invalidates
  1222         -**     the first journal header in the file, and hence the entire journal
  1223         -**     file. An invalid journal file cannot be rolled back.
  1224         -**
  1225         -**   journalMode==DELETE
  1226         -**     The journal file is closed and deleted using sqlite3OsDelete().
  1227         -**
  1228         -**     If the pager is running in exclusive mode, this method of finalizing
  1229         -**     the journal file is never used. Instead, if the journalMode is
  1230         -**     DELETE and the pager is in exclusive mode, the method described under
  1231         -**     journalMode==PERSIST is used instead.
  1232         -**
  1233         -** After the journal is finalized, if running in non-exclusive mode, the
  1234         -** pager moves to PAGER_SHARED state (and downgrades the lock on the
  1235         -** database file accordingly).
  1236         -**
  1237         -** If the pager is running in exclusive mode and is in PAGER_SYNCED state,
  1238         -** it moves to PAGER_EXCLUSIVE. No locks are downgraded when running in
  1239         -** exclusive mode.
  1240         -**
  1241         -** SQLITE_OK is returned if no error occurs. If an error occurs during
  1242         -** any of the IO operations to finalize the journal file or unlock the
  1243         -** database then the IO error code is returned to the user. If the 
  1244         -** operation to finalize the journal file fails, then the code still
  1245         -** tries to unlock the database file if not in exclusive mode. If the
  1246         -** unlock operation fails as well, then the first error code related
  1247         -** to the first error encountered (the journal finalization one) is
  1248         -** returned.
         1037  +  pager_unlock(p);
         1038  +}
         1039  +
         1040  +/*
         1041  +** This routine ends a transaction.  A transaction is ended by either
         1042  +** a COMMIT or a ROLLBACK.
         1043  +**
         1044  +** When this routine is called, the pager has the journal file open and
         1045  +** a RESERVED or EXCLUSIVE lock on the database.  This routine will release
         1046  +** the database lock and acquires a SHARED lock in its place if that is
         1047  +** the appropriate thing to do.  Release locks usually is appropriate,
         1048  +** unless we are in exclusive access mode or unless this is a 
         1049  +** COMMIT AND BEGIN or ROLLBACK AND BEGIN operation.
         1050  +**
         1051  +** The journal file is either deleted or truncated.
         1052  +**
         1053  +** TODO: Consider keeping the journal file open for temporary databases.
         1054  +** This might give a performance improvement on windows where opening
         1055  +** a file is an expensive operation.
  1249   1056   */
  1250   1057   static int pager_end_transaction(Pager *pPager, int hasMaster){
  1251         -  int rc = SQLITE_OK;      /* Error code from journal finalization operation */
  1252         -  int rc2 = SQLITE_OK;     /* Error code from db file unlock operation */
  1253         -
         1058  +  int rc = SQLITE_OK;
         1059  +  int rc2 = SQLITE_OK;
  1254   1060     if( pPager->state<PAGER_RESERVED ){
  1255   1061       return SQLITE_OK;
  1256   1062     }
  1257         -  releaseAllSavepoints(pPager);
  1258         -
  1259         -  assert( isOpen(pPager->jfd) || pPager->pInJournal==0 );
  1260         -  if( isOpen(pPager->jfd) ){
  1261         -
  1262         -    /* TODO: There's a problem here if a journal-file was opened in MEMORY
  1263         -    ** mode and then the journal-mode is changed to TRUNCATE or PERSIST
  1264         -    ** during the transaction. This code should be changed to assume
  1265         -    ** that the journal mode has not changed since the transaction was
  1266         -    ** started. And the sqlite3PagerJournalMode() function should be
  1267         -    ** changed to make sure that this is the case too.
  1268         -    */
  1269         -
  1270         -    /* Finalize the journal file. */
         1063  +  releaseAllSavepoint(pPager);
         1064  +  if( pPager->journalOpen ){
  1271   1065       if( pPager->journalMode==PAGER_JOURNALMODE_MEMORY ){
  1272   1066         int isMemoryJournal = sqlite3IsMemJournal(pPager->jfd);
  1273   1067         sqlite3OsClose(pPager->jfd);
         1068  +      pPager->journalOpen = 0;
  1274   1069         if( !isMemoryJournal ){
  1275   1070           rc = sqlite3OsDelete(pPager->pVfs, pPager->zJournal, 0);
  1276   1071         }
  1277   1072       }else if( pPager->journalMode==PAGER_JOURNALMODE_TRUNCATE
  1278   1073            && (rc = sqlite3OsTruncate(pPager->jfd, 0))==SQLITE_OK ){
  1279   1074         pPager->journalOff = 0;
  1280   1075         pPager->journalStarted = 0;
................................................................................
  1284   1079         rc = zeroJournalHdr(pPager, hasMaster);
  1285   1080         pager_error(pPager, rc);
  1286   1081         pPager->journalOff = 0;
  1287   1082         pPager->journalStarted = 0;
  1288   1083       }else{
  1289   1084         assert( pPager->journalMode==PAGER_JOURNALMODE_DELETE || rc );
  1290   1085         sqlite3OsClose(pPager->jfd);
         1086  +      pPager->journalOpen = 0;
  1291   1087         if( rc==SQLITE_OK && !pPager->tempFile ){
  1292   1088           rc = sqlite3OsDelete(pPager->pVfs, pPager->zJournal, 0);
  1293   1089         }
  1294   1090       }
  1295         -
         1091  +    sqlite3BitvecDestroy(pPager->pInJournal);
         1092  +    pPager->pInJournal = 0;
         1093  +    sqlite3BitvecDestroy(pPager->pAlwaysRollback);
         1094  +    pPager->pAlwaysRollback = 0;
  1296   1095   #ifdef SQLITE_CHECK_PAGES
  1297   1096       sqlite3PcacheIterateDirty(pPager->pPCache, pager_set_pagehash);
  1298   1097   #endif
  1299         -
  1300   1098       sqlite3PcacheCleanAll(pPager->pPCache);
  1301         -    sqlite3BitvecDestroy(pPager->pInJournal);
  1302         -    pPager->pInJournal = 0;
         1099  +    pPager->dirtyCache = 0;
  1303   1100       pPager->nRec = 0;
         1101  +  }else{
         1102  +    assert( pPager->pInJournal==0 );
  1304   1103     }
  1305   1104   
  1306   1105     if( !pPager->exclusiveMode ){
  1307   1106       rc2 = osUnlock(pPager->fd, SHARED_LOCK);
  1308   1107       pPager->state = PAGER_SHARED;
  1309   1108       pPager->changeCountDone = 0;
  1310   1109     }else if( pPager->state==PAGER_SYNCED ){
  1311   1110       pPager->state = PAGER_EXCLUSIVE;
  1312   1111     }
         1112  +  pPager->dbOrigSize = 0;
  1313   1113     pPager->setMaster = 0;
  1314   1114     pPager->needSync = 0;
  1315         -  pPager->dbModified = 0;
  1316         -
  1317         -  /* TODO: Is this optimal? Why is the db size invalidated here 
  1318         -  ** when the database file is not unlocked? */
  1319         -  pPager->dbOrigSize = 0;
         1115  +  /* lruListSetFirstSynced(pPager); */
  1320   1116     sqlite3PcacheTruncate(pPager->pPCache, pPager->dbSize);
  1321   1117     if( !MEMDB ){
  1322   1118       pPager->dbSizeValid = 0;
  1323   1119     }
         1120  +  pPager->dbModified = 0;
  1324   1121   
  1325   1122     return (rc==SQLITE_OK?rc2:rc);
  1326   1123   }
  1327   1124   
  1328   1125   /*
  1329         -** Parameter aData must point to a buffer of pPager->pageSize bytes
  1330         -** of data. Compute and return a checksum based ont the contents of the 
  1331         -** page of data and the current value of pPager->cksumInit.
         1126  +** Compute and return a checksum for the page of data.
  1332   1127   **
  1333         -** This is not a real checksum. It is really just the sum of the 
  1334         -** random initial value (pPager->cksumInit) and every 200th byte
  1335         -** of the page data, starting with byte offset (pPager->pageSize%200).
  1336         -** Each byte is interpreted as an 8-bit unsigned integer.
         1128  +** This is not a real checksum.  It is really just the sum of the 
         1129  +** random initial value and the page number.  We experimented with
         1130  +** a checksum of the entire data, but that was found to be too slow.
  1337   1131   **
  1338         -** Changing the formula used to compute this checksum results in an
  1339         -** incompatible journal file format.
  1340         -**
  1341         -** If journal corruption occurs due to a power failure, the most likely 
  1342         -** scenario is that one end or the other of the record will be changed. 
  1343         -** It is much less likely that the two ends of the journal record will be
         1132  +** Note that the page number is stored at the beginning of data and
         1133  +** the checksum is stored at the end.  This is important.  If journal
         1134  +** corruption occurs due to a power failure, the most likely scenario
         1135  +** is that one end or the other of the record will be changed.  It is
         1136  +** much less likely that the two ends of the journal record will be
  1344   1137   ** correct and the middle be corrupt.  Thus, this "checksum" scheme,
  1345   1138   ** though fast and simple, catches the mostly likely kind of corruption.
         1139  +**
         1140  +** FIX ME:  Consider adding every 200th (or so) byte of the data to the
         1141  +** checksum.  That way if a single page spans 3 or more disk sectors and
         1142  +** only the middle sector is corrupt, we will still have a reasonable
         1143  +** chance of failing the checksum and thus detecting the problem.
  1346   1144   */
  1347   1145   static u32 pager_cksum(Pager *pPager, const u8 *aData){
  1348         -  u32 cksum = pPager->cksumInit;         /* Checksum value to return */
  1349         -  int i = pPager->pageSize-200;          /* Loop counter */
         1146  +  u32 cksum = pPager->cksumInit;
         1147  +  int i = pPager->pageSize-200;
  1350   1148     while( i>0 ){
  1351   1149       cksum += aData[i];
  1352   1150       i -= 200;
  1353   1151     }
  1354   1152     return cksum;
  1355   1153   }
  1356   1154   
  1357   1155   /*
  1358   1156   ** Read a single page from either the journal file (if isMainJrnl==1) or
  1359   1157   ** from the sub-journal (if isMainJrnl==0) and playback that page.
  1360         -** The page begins at offset *pOffset into the file. The *pOffset
         1158  +** The page begins at offset *pOffset into the file.  The  *pOffset
  1361   1159   ** value is increased to the start of the next page in the journal.
  1362   1160   **
  1363   1161   ** The isMainJrnl flag is true if this is the main rollback journal and
  1364   1162   ** false for the statement journal.  The main rollback journal uses
  1365   1163   ** checksums - the statement journal does not.
  1366   1164   **
  1367         -** If the page number of the page record read from the (sub-)journal file
  1368         -** is greater than the current value of Pager.dbSize, then playback is
  1369         -** skipped and SQLITE_OK is returned.
  1370         -**
  1371   1165   ** If pDone is not NULL, then it is a record of pages that have already
  1372   1166   ** been played back.  If the page at *pOffset has already been played back
  1373   1167   ** (if the corresponding pDone bit is set) then skip the playback.
  1374   1168   ** Make sure the pDone bit corresponding to the *pOffset page is set
  1375   1169   ** prior to returning.
  1376         -**
  1377         -** If the page record is successfully read from the (sub-)journal file
  1378         -** and played back, then SQLITE_OK is returned. If an IO error occurs
  1379         -** while reading the record from the (sub-)journal file or while writing
  1380         -** to the database file, then the IO error code is returned. If data
  1381         -** is successfully read from the (sub-)journal file but appears to be
  1382         -** corrupted, SQLITE_DONE is returned. Data is considered corrupted in
  1383         -** two circumstances:
  1384         -** 
  1385         -**   * If the record page-number is illegal (0 or PAGER_MJ_PGNO), or
  1386         -**   * If the record is being rolled back from the main journal file
  1387         -**     and the checksum field does not match the record content.
  1388         -**
  1389         -** Neither of these two scenarios are possible during a savepoint rollback.
  1390         -**
  1391         -** If this is a savepoint rollback, then memory may have to be dynamically
  1392         -** allocated by this function. If this is the case and an allocation fails,
  1393         -** SQLITE_NOMEM is returned.
  1394   1170   */
  1395   1171   static int pager_playback_one_page(
  1396   1172     Pager *pPager,                /* The pager being played back */
  1397   1173     int isMainJrnl,               /* 1 -> main journal. 0 -> sub-journal. */
  1398   1174     i64 *pOffset,                 /* Offset of record to playback */
  1399   1175     int isSavepnt,                /* True for a savepoint rollback */
  1400   1176     Bitvec *pDone                 /* Bitvec of pages already played back */
................................................................................
  1410   1186     assert( (isSavepnt&~1)==0 );       /* isSavepnt is 0 or 1 */
  1411   1187     assert( isMainJrnl || pDone );     /* pDone always used on sub-journals */
  1412   1188     assert( isSavepnt || pDone==0 );   /* pDone never used on non-savepoint */
  1413   1189   
  1414   1190     aData = (u8*)pPager->pTmpSpace;
  1415   1191     assert( aData );         /* Temp storage must have already been allocated */
  1416   1192   
  1417         -  /* Read the page number and page data from the journal or sub-journal
  1418         -  ** file. Return an error code to the caller if an IO error occurs.
  1419         -  */
  1420   1193     jfd = isMainJrnl ? pPager->jfd : pPager->sjfd;
         1194  +
  1421   1195     rc = read32bits(jfd, *pOffset, &pgno);
  1422   1196     if( rc!=SQLITE_OK ) return rc;
  1423   1197     rc = sqlite3OsRead(jfd, aData, pPager->pageSize, (*pOffset)+4);
  1424   1198     if( rc!=SQLITE_OK ) return rc;
  1425   1199     *pOffset += pPager->pageSize + 4 + isMainJrnl*4;
  1426   1200   
  1427   1201     /* Sanity checking on the page.  This is more important that I originally
  1428   1202     ** thought.  If a power failure occurs while the journal is being written,
  1429   1203     ** it could cause invalid data to be written into the journal.  We need to
  1430   1204     ** detect this invalid data (with high probability) and ignore it.
  1431   1205     */
  1432   1206     if( pgno==0 || pgno==PAGER_MJ_PGNO(pPager) ){
  1433         -    assert( !isSavepnt );
  1434   1207       return SQLITE_DONE;
  1435   1208     }
  1436   1209     if( pgno>(Pgno)pPager->dbSize || sqlite3BitvecTest(pDone, pgno) ){
  1437   1210       return SQLITE_OK;
  1438   1211     }
  1439   1212     if( isMainJrnl ){
  1440   1213       rc = read32bits(jfd, (*pOffset)-4, &cksum);
  1441   1214       if( rc ) return rc;
  1442   1215       if( !isSavepnt && pager_cksum(pPager, aData)!=cksum ){
  1443   1216         return SQLITE_DONE;
  1444   1217       }
  1445   1218     }
  1446         -
  1447   1219     if( pDone && (rc = sqlite3BitvecSet(pDone, pgno)) ){
  1448   1220       return rc;
  1449   1221     }
  1450   1222   
  1451   1223     assert( pPager->state==PAGER_RESERVED || pPager->state>=PAGER_EXCLUSIVE );
  1452   1224   
  1453   1225     /* If the pager is in RESERVED state, then there must be a copy of this
................................................................................
  1485   1257     pPg = pager_lookup(pPager, pgno);
  1486   1258     PAGERTRACE(("PLAYBACK %d page %d hash(%08x) %s\n",
  1487   1259                  PAGERID(pPager), pgno, pager_datahash(pPager->pageSize, aData),
  1488   1260                  (isMainJrnl?"main-journal":"sub-journal")
  1489   1261     ));
  1490   1262     if( (pPager->state>=PAGER_EXCLUSIVE)
  1491   1263      && (pPg==0 || 0==(pPg->flags&PGHDR_NEED_SYNC))
  1492         -   && isOpen(pPager->fd)
         1264  +   && (pPager->fd->pMethods)
  1493   1265     ){
  1494   1266       i64 ofst = (pgno-1)*(i64)pPager->pageSize;
  1495   1267       rc = sqlite3OsWrite(pPager->fd, aData, pPager->pageSize, ofst);
  1496   1268       if( pgno>pPager->dbFileSize ){
  1497   1269         pPager->dbFileSize = pgno;
  1498   1270       }
  1499   1271     }else if( !isMainJrnl && pPg==0 ){
................................................................................
  1529   1301       */
  1530   1302       void *pData;
  1531   1303       pData = pPg->pData;
  1532   1304       memcpy(pData, aData, pPager->pageSize);
  1533   1305       if( pPager->xReiniter ){
  1534   1306         pPager->xReiniter(pPg);
  1535   1307       }
  1536         -    if( isMainJrnl && (!isSavepnt || *pOffset<=pPager->journalHdr) ){
         1308  +    if( isMainJrnl && (!isSavepnt || pPager->journalOff<=pPager->journalHdr) ){
  1537   1309         /* If the contents of this page were just restored from the main 
  1538   1310         ** journal file, then its content must be as they were when the 
  1539   1311         ** transaction was first opened. In this case we can mark the page
  1540   1312         ** as clean, since there will be no need to write it out to the.
  1541   1313         **
  1542   1314         ** There is one exception to this rule. If the page is being rolled
  1543   1315         ** back as part of a savepoint (or statement) rollback from an 
................................................................................
  1615   1387   ** file that referred to the master journal file has just been rolled back.
  1616   1388   ** This routine checks if it is possible to delete the master journal file,
  1617   1389   ** and does so if it is.
  1618   1390   **
  1619   1391   ** Argument zMaster may point to Pager.pTmpSpace. So that buffer is not 
  1620   1392   ** available for use within this function.
  1621   1393   **
  1622         -** When a master journal file is created, it is populated with the names 
  1623         -** of all of its child journals, one after another, formatted as utf-8 
  1624         -** encoded text. The end of each child journal file is marked with a 
  1625         -** nul-terminator byte (0x00). i.e. the entire contents of a master journal
  1626         -** file for a transaction involving two databases might be:
  1627   1394   **
  1628         -**   "/home/bill/a.db-journal\x00/home/bill/b.db-journal\x00"
  1629         -**
  1630         -** A master journal file may only be deleted once all of its child 
  1631         -** journals have been rolled back.
  1632         -**
  1633         -** This function reads the contents of the master-journal file into 
  1634         -** memory and loops through each of the child journal names. For
  1635         -** each child journal, it checks if:
  1636         -**
  1637         -**   * if the child journal exists, and if so
  1638         -**   * if the child journal contains a reference to master journal 
  1639         -**     file zMaster
  1640         -**
  1641         -** If a child journal can be found that matches both of the criteria
  1642         -** above, this function returns without doing anything. Otherwise, if
  1643         -** no such child journal can be found, file zMaster is deleted from
  1644         -** the file-system using sqlite3OsDelete().
  1645         -**
  1646         -** If an IO error within this function, an error code is returned. This
  1647         -** function allocates memory by calling sqlite3Malloc(). If an allocation
  1648         -** fails, SQLITE_NOMEM is returned. Otherwise, if no IO or malloc errors 
  1649         -** occur, SQLITE_OK is returned.
  1650         -**
  1651         -** TODO: This function allocates a single block of memory to load
  1652         -** the entire contents of the master journal file. This could be
  1653         -** a couple of kilobytes or so - potentially larger than the page 
  1654         -** size.
         1395  +** The master journal file contains the names of all child journals.
         1396  +** To tell if a master journal can be deleted, check to each of the
         1397  +** children.  If all children are either missing or do not refer to
         1398  +** a different master journal, then this master journal can be deleted.
  1655   1399   */
  1656   1400   static int pager_delmaster(Pager *pPager, const char *zMaster){
  1657   1401     sqlite3_vfs *pVfs = pPager->pVfs;
  1658         -  int rc;                   /* Return code */
  1659         -  sqlite3_file *pMaster;    /* Malloc'd master-journal file descriptor */
  1660         -  sqlite3_file *pJournal;   /* Malloc'd child-journal file descriptor */
         1402  +  int rc;
         1403  +  int master_open = 0;
         1404  +  sqlite3_file *pMaster;
         1405  +  sqlite3_file *pJournal;
  1661   1406     char *zMasterJournal = 0; /* Contents of master journal file */
  1662   1407     i64 nMasterJournal;       /* Size of master journal file */
  1663   1408   
  1664   1409     /* Open the master journal file exclusively in case some other process
  1665   1410     ** is running this routine also. Not that it makes too much difference.
  1666   1411     */
  1667   1412     pMaster = (sqlite3_file *)sqlite3Malloc(pVfs->szOsFile * 2);
................................................................................
  1669   1414     if( !pMaster ){
  1670   1415       rc = SQLITE_NOMEM;
  1671   1416     }else{
  1672   1417       int flags = (SQLITE_OPEN_READONLY|SQLITE_OPEN_MASTER_JOURNAL);
  1673   1418       rc = sqlite3OsOpen(pVfs, zMaster, pMaster, flags, 0);
  1674   1419     }
  1675   1420     if( rc!=SQLITE_OK ) goto delmaster_out;
         1421  +  master_open = 1;
  1676   1422   
  1677   1423     rc = sqlite3OsFileSize(pMaster, &nMasterJournal);
  1678   1424     if( rc!=SQLITE_OK ) goto delmaster_out;
  1679   1425   
  1680   1426     if( nMasterJournal>0 ){
  1681   1427       char *zJournal;
  1682   1428       char *zMasterPtr = 0;
  1683         -    int nMasterPtr = pVfs->mxPathname+1;
         1429  +    int nMasterPtr = pPager->pVfs->mxPathname+1;
  1684   1430   
  1685   1431       /* Load the entire master journal file into space obtained from
  1686   1432       ** sqlite3_malloc() and pointed to by zMasterJournal. 
  1687   1433       */
  1688   1434       zMasterJournal = (char *)sqlite3Malloc((int)nMasterJournal + nMasterPtr);
  1689   1435       if( !zMasterJournal ){
  1690   1436         rc = SQLITE_NOMEM;
................................................................................
  1731   1477     
  1732   1478     rc = sqlite3OsDelete(pVfs, zMaster, 0);
  1733   1479   
  1734   1480   delmaster_out:
  1735   1481     if( zMasterJournal ){
  1736   1482       sqlite3_free(zMasterJournal);
  1737   1483     }  
  1738         -  if( pMaster ){
         1484  +  if( master_open ){
  1739   1485       sqlite3OsClose(pMaster);
  1740         -    assert( !isOpen(pJournal) );
  1741   1486     }
  1742   1487     sqlite3_free(pMaster);
  1743   1488     return rc;
  1744   1489   }
  1745   1490   
  1746   1491   
  1747   1492   /*
  1748         -** This function is used to change the actual size of the database 
  1749         -** file in the file-system. This only happens when committing a transaction,
  1750         -** or rolling back a transaction (including rolling back a hot-journal).
         1493  +** If the main database file is open and an exclusive lock is held, 
         1494  +** truncate the main file of the given pager to the specified number 
         1495  +** of pages.
  1751   1496   **
  1752         -** If the main database file is not open, or an exclusive lock is not
  1753         -** held, this function is a no-op. Otherwise, the size of the file is
  1754         -** changed to nPage pages (nPage*pPager->pageSize bytes). If the file
  1755         -** on disk is currently larger than nPage pages, then use the VFS
  1756         -** xTruncate() method to truncate it.
  1757         -**
  1758         -** Or, it might might be the case that the file on disk is smaller than 
  1759         -** nPage pages. Some operating system implementations can get confused if 
  1760         -** you try to truncate a file to some size that is larger than it 
  1761         -** currently is, so detect this case and write a single zero byte to 
  1762         -** the end of the new file instead.
  1763         -**
  1764         -** If successful, return SQLITE_OK. If an IO error occurs while modifying
  1765         -** the database file, return the error code to the caller.
         1497  +** It might might be the case that the file on disk is smaller than nPage.
         1498  +** This can happen, for example, if we are in the middle of a transaction
         1499  +** which has extended the file size and the new pages are still all held
         1500  +** in cache, then an INSERT or UPDATE does a statement rollback.  Some
         1501  +** operating system implementations can get confused if you try to
         1502  +** truncate a file to some size that is larger than it currently is,
         1503  +** so detect this case and write a single zero byte to the end of the new
         1504  +** file instead.
  1766   1505   */
  1767   1506   static int pager_truncate(Pager *pPager, Pgno nPage){
  1768   1507     int rc = SQLITE_OK;
  1769         -  if( pPager->state>=PAGER_EXCLUSIVE && isOpen(pPager->fd) ){
         1508  +  if( pPager->state>=PAGER_EXCLUSIVE && pPager->fd->pMethods ){
  1770   1509       i64 currentSize, newSize;
  1771         -    /* TODO: Is it safe to use Pager.dbFileSize here? */
  1772   1510       rc = sqlite3OsFileSize(pPager->fd, &currentSize);
  1773   1511       newSize = pPager->pageSize*(i64)nPage;
  1774   1512       if( rc==SQLITE_OK && currentSize!=newSize ){
  1775   1513         if( currentSize>newSize ){
  1776   1514           rc = sqlite3OsTruncate(pPager->fd, newSize);
  1777   1515         }else{
  1778   1516           rc = sqlite3OsWrite(pPager->fd, "", 1, newSize-1);
................................................................................
  1782   1520         }
  1783   1521       }
  1784   1522     }
  1785   1523     return rc;
  1786   1524   }
  1787   1525   
  1788   1526   /*
  1789         -** Set the value of the Pager.sectorSize variable for the given
  1790         -** pager based on the value returned by the xSectorSize method
  1791         -** of the open database file. The sector size will be used used 
  1792         -** to determine the size and alignment of journal header and 
  1793         -** master journal pointers within created journal files.
         1527  +** Set the sectorSize for the given pager.
  1794   1528   **
  1795         -** For temporary files the effective sector size is always 512 bytes.
  1796         -**
  1797         -** Otherwise, for non-temporary files, the effective sector size is
  1798         -** the value returned by the xSectorSize() method rounded up to 512 if
  1799         -** it is less than 512, or rounded down to MAX_SECTOR_SIZE if it
  1800         -** is greater than MAX_SECTOR_SIZE.
         1529  +** The sector size is at least as big as the sector size reported
         1530  +** by sqlite3OsSectorSize(). The minimum sector size is 512.
  1801   1531   */
  1802   1532   static void setSectorSize(Pager *pPager){
  1803         -  assert( isOpen(pPager->fd) || pPager->tempFile );
  1804         -
         1533  +  assert(pPager->fd->pMethods||pPager->tempFile);
  1805   1534     if( !pPager->tempFile ){
  1806   1535       /* Sector size doesn't matter for temporary files. Also, the file
  1807         -    ** may not have been opened yet, in which case the OsSectorSize()
         1536  +    ** may not have been opened yet, in whcih case the OsSectorSize()
  1808   1537       ** call will segfault.
  1809   1538       */
  1810   1539       pPager->sectorSize = sqlite3OsSectorSize(pPager->fd);
  1811   1540     }
  1812   1541     if( pPager->sectorSize<512 ){
  1813   1542       pPager->sectorSize = 512;
  1814   1543     }
  1815   1544     if( pPager->sectorSize>MAX_SECTOR_SIZE ){
  1816         -    assert( MAX_SECTOR_SIZE>=512 );
  1817   1545       pPager->sectorSize = MAX_SECTOR_SIZE;
  1818   1546     }
  1819   1547   }
  1820   1548   
  1821   1549   /*
  1822   1550   ** Playback the journal and thus restore the database file to
  1823   1551   ** the state it was in before we started making changes.  
................................................................................
  1883   1611     int rc;                  /* Result code of a subroutine */
  1884   1612     int res = 1;             /* Value returned by sqlite3OsAccess() */
  1885   1613     char *zMaster = 0;       /* Name of master journal file if any */
  1886   1614   
  1887   1615     /* Figure out how many records are in the journal.  Abort early if
  1888   1616     ** the journal is empty.
  1889   1617     */
  1890         -  assert( isOpen(pPager->jfd) );
         1618  +  assert( pPager->journalOpen );
  1891   1619     rc = sqlite3OsFileSize(pPager->jfd, &szJ);
  1892   1620     if( rc!=SQLITE_OK || szJ==0 ){
  1893   1621       goto end_playback;
  1894   1622     }
  1895   1623   
  1896   1624     /* Read the master journal name from the journal, if it is present.
  1897   1625     ** If a master journal file name is specified, but the file is not
  1898   1626     ** present on disk, then the journal is not hot and does not need to be
  1899   1627     ** played back.
  1900         -  **
  1901         -  ** TODO: Technically the following is an error because it assumes that
  1902         -  ** buffer Pager.pTmpSpace is (mxPathname+1) bytes or larger. i.e. that
  1903         -  ** (pPager->pageSize >= pPager->pVfs->mxPathname+1). Using os_unix.c,
  1904         -  **  mxPathname is 512, which is the same as the minimum allowable value
  1905         -  ** for pageSize.
  1906   1628     */
  1907   1629     zMaster = pPager->pTmpSpace;
  1908   1630     rc = readMasterJournal(pPager->jfd, zMaster, pPager->pVfs->mxPathname+1);
  1909   1631     if( rc==SQLITE_OK && zMaster[0] ){
  1910   1632       rc = sqlite3OsAccess(pVfs, zMaster, SQLITE_ACCESS_EXISTS, &res);
  1911   1633     }
  1912   1634     zMaster = 0;
  1913   1635     if( rc!=SQLITE_OK || !res ){
  1914   1636       goto end_playback;
  1915   1637     }
  1916   1638     pPager->journalOff = 0;
  1917   1639   
  1918         -  /* This loop terminates either when a readJournalHdr() or 
  1919         -  ** pager_playback_one_page() call returns SQLITE_DONE or an IO error 
  1920         -  ** occurs. 
  1921         -  */
         1640  +  /* This loop terminates either when the readJournalHdr() call returns
         1641  +  ** SQLITE_DONE or an IO error occurs. */
  1922   1642     while( 1 ){
  1923   1643   
  1924   1644       /* Read the next journal header from the journal file.  If there are
  1925   1645       ** not enough bytes left in the journal file for a complete header, or
  1926   1646       ** it is corrupted, then a process must of failed while writing it.
  1927   1647       ** This indicates nothing more needs to be rolled back.
  1928   1648       */
................................................................................
  1975   1695         rc = pager_truncate(pPager, mxPg);
  1976   1696         if( rc!=SQLITE_OK ){
  1977   1697           goto end_playback;
  1978   1698         }
  1979   1699         pPager->dbSize = mxPg;
  1980   1700       }
  1981   1701   
  1982         -    /* Copy original pages out of the journal and back into the 
  1983         -    ** database file and/or page cache.
         1702  +    /* Copy original pages out of the journal and back into the database file.
  1984   1703       */
  1985   1704       for(u=0; u<nRec; u++){
  1986   1705         rc = pager_playback_one_page(pPager, 1, &pPager->journalOff, 0, 0);
  1987   1706         if( rc!=SQLITE_OK ){
  1988   1707           if( rc==SQLITE_DONE ){
  1989   1708             rc = SQLITE_OK;
  1990   1709             pPager->journalOff = szJ;
................................................................................
  2013   1732       pPager->fd->pMethods==0 ||
  2014   1733       sqlite3OsFileControl(pPager->fd,SQLITE_FCNTL_DB_UNCHANGED,0)>=SQLITE_OK
  2015   1734     );
  2016   1735   
  2017   1736     if( rc==SQLITE_OK ){
  2018   1737       zMaster = pPager->pTmpSpace;
  2019   1738       rc = readMasterJournal(pPager->jfd, zMaster, pPager->pVfs->mxPathname+1);
  2020         -    testcase( rc!=SQLITE_OK );
  2021   1739     }
  2022   1740     if( rc==SQLITE_OK ){
  2023   1741       rc = pager_end_transaction(pPager, zMaster[0]!='\0');
  2024         -    testcase( rc!=SQLITE_OK );
  2025   1742     }
  2026   1743     if( rc==SQLITE_OK && zMaster[0] && res ){
  2027   1744       /* If there was a master journal and this routine will return success,
  2028   1745       ** see if it is possible to delete the master journal.
  2029   1746       */
  2030   1747       rc = pager_delmaster(pPager, zMaster);
  2031         -    testcase( rc!=SQLITE_OK );
  2032   1748     }
  2033   1749   
  2034   1750     /* The Pager.sectorSize variable may have been updated while rolling
  2035   1751     ** back a journal created by a process with a different sector size
  2036   1752     ** value. Reset it to the correct value for this process.
  2037   1753     */
  2038   1754     setSectorSize(pPager);
  2039   1755     return rc;
  2040   1756   }
  2041   1757   
  2042   1758   /*
  2043         -** Playback savepoint pSavepoint. Or, if pSavepoint==NULL, then playback
  2044         -** the entire master journal file. The case pSavepoint==NULL occurs when 
  2045         -** a ROLLBACK TO command is invoked on a SAVEPOINT that is a transaction 
  2046         -** savepoint.
         1759  +** Playback savepoint pSavepoint.  Or, if pSavepoint==NULL, then playback
         1760  +** the entire master journal file.
  2047   1761   **
  2048         -** When pSavepoint is not NULL (meaning a non-transaction savepoint is 
  2049         -** being rolled back), then the rollback consists of up to three stages,
  2050         -** performed in the order specified:
  2051         -**
  2052         -**   * Pages are played back from the main journal starting at byte
  2053         -**     offset PagerSavepoint.iOffset and continuing to 
  2054         -**     PagerSavepoint.iHdrOffset, or to the end of the main journal
  2055         -**     file if PagerSavepoint.iHdrOffset is zero.
  2056         -**
  2057         -**   * If PagerSavepoint.iHdrOffset is not zero, then pages are played
  2058         -**     back starting from the journal header immediately following 
  2059         -**     PagerSavepoint.iHdrOffset to the end of the main journal file.
  2060         -**
  2061         -**   * Pages are then played back from the sub-journal file, starting
  2062         -**     with the PagerSavepoint.iSubRec and continuing to the end of
  2063         -**     the journal file.
  2064         -**
  2065         -** Throughout the rollback process, each time a page is rolled back, the
  2066         -** corresponding bit is set in a bitvec structure (variable pDone in the
  2067         -** implementation below). This is used to ensure that a page is only
  2068         -** rolled back the first time it is encountered in either journal.
  2069         -**
  2070         -** If pSavepoint is NULL, then pages are only played back from the main
  2071         -** journal file. There is no need for a bitvec in this case.
  2072         -**
  2073         -** In either case, before playback commences the Pager.dbSize variable
  2074         -** is reset to the value that it held at the start of the savepoint 
  2075         -** (or transaction). No page with a page-number greater than this value
  2076         -** is played back. If one is encountered it is simply skipped.
         1762  +** The case pSavepoint==NULL occurs when a ROLLBACK TO command is invoked
         1763  +** on a SAVEPOINT that is a transaction savepoint.
  2077   1764   */
  2078   1765   static int pagerPlaybackSavepoint(Pager *pPager, PagerSavepoint *pSavepoint){
  2079   1766     i64 szJ;                 /* Effective size of the main journal */
  2080   1767     i64 iHdrOff;             /* End of first segment of main-journal records */
         1768  +  Pgno ii;                 /* Loop counter */
  2081   1769     int rc = SQLITE_OK;      /* Return code */
  2082   1770     Bitvec *pDone = 0;       /* Bitvec to ensure pages played back only once */
  2083   1771   
  2084         -  assert( pPager->state>=PAGER_SHARED );
  2085         -
  2086   1772     /* Allocate a bitvec to use to store the set of pages rolled back */
  2087   1773     if( pSavepoint ){
  2088   1774       pDone = sqlite3BitvecCreate(pSavepoint->nOrig);
  2089   1775       if( !pDone ){
  2090   1776         return SQLITE_NOMEM;
  2091   1777       }
  2092   1778     }
  2093   1779   
  2094         -  /* Set the database size back to the value it was before the savepoint 
  2095         -  ** being reverted was opened.
         1780  +  /* Truncate the database back to the size it was before the 
         1781  +  ** savepoint being reverted was opened.
  2096   1782     */
  2097   1783     pPager->dbSize = pSavepoint ? pSavepoint->nOrig : pPager->dbOrigSize;
         1784  +  assert( pPager->state>=PAGER_SHARED );
  2098   1785   
  2099   1786     /* Use pPager->journalOff as the effective size of the main rollback
  2100   1787     ** journal.  The actual file might be larger than this in
  2101   1788     ** PAGER_JOURNALMODE_TRUNCATE or PAGER_JOURNALMODE_PERSIST.  But anything
  2102   1789     ** past pPager->journalOff is off-limits to us.
  2103   1790     */
  2104   1791     szJ = pPager->journalOff;
................................................................................
  2111   1798     ** are played back.
  2112   1799     */
  2113   1800     if( pSavepoint ){
  2114   1801       iHdrOff = pSavepoint->iHdrOffset ? pSavepoint->iHdrOffset : szJ;
  2115   1802       pPager->journalOff = pSavepoint->iOffset;
  2116   1803       while( rc==SQLITE_OK && pPager->journalOff<iHdrOff ){
  2117   1804         rc = pager_playback_one_page(pPager, 1, &pPager->journalOff, 1, pDone);
         1805  +      assert( rc!=SQLITE_DONE );
  2118   1806       }
  2119         -    assert( rc!=SQLITE_DONE );
  2120   1807     }else{
  2121   1808       pPager->journalOff = 0;
  2122   1809     }
  2123   1810   
  2124   1811     /* Continue rolling back records out of the main journal starting at
  2125   1812     ** the first journal header seen and continuing until the effective end
  2126   1813     ** of the main journal file.  Continue to skip out-of-range pages and
  2127   1814     ** continue adding pages rolled back to pDone.
  2128   1815     */
  2129   1816     while( rc==SQLITE_OK && pPager->journalOff<szJ ){
  2130         -    u32 ii;            /* Loop counter */
  2131   1817       u32 nJRec = 0;     /* Number of Journal Records */
  2132   1818       u32 dummy;
  2133   1819       rc = readJournalHdr(pPager, szJ, &nJRec, &dummy);
  2134   1820       assert( rc!=SQLITE_DONE );
  2135   1821   
  2136   1822       /*
  2137   1823       ** The "pPager->journalHdr+JOURNAL_HDR_SZ(pPager)==pPager->journalOff"
................................................................................
  2146   1832       if( nJRec==0 
  2147   1833        && pPager->journalHdr+JOURNAL_HDR_SZ(pPager)==pPager->journalOff
  2148   1834       ){
  2149   1835         nJRec = (szJ - pPager->journalOff)/JOURNAL_PG_SZ(pPager);
  2150   1836       }
  2151   1837       for(ii=0; rc==SQLITE_OK && ii<nJRec && pPager->journalOff<szJ; ii++){
  2152   1838         rc = pager_playback_one_page(pPager, 1, &pPager->journalOff, 1, pDone);
         1839  +      assert( rc!=SQLITE_DONE );
  2153   1840       }
  2154         -    assert( rc!=SQLITE_DONE );
  2155   1841     }
  2156   1842     assert( rc!=SQLITE_OK || pPager->journalOff==szJ );
  2157   1843   
  2158   1844     /* Finally,  rollback pages from the sub-journal.  Page that were
  2159   1845     ** previously rolled back out of the main journal (and are hence in pDone)
  2160   1846     ** will be skipped.  Out-of-range pages are also skipped.
  2161   1847     */
  2162   1848     if( pSavepoint ){
  2163         -    u32 ii;            /* Loop counter */
  2164   1849       i64 offset = pSavepoint->iSubRec*(4+pPager->pageSize);
  2165         -    for(ii=pSavepoint->iSubRec; rc==SQLITE_OK && ii<pPager->nSubRec; ii++){
  2166         -      assert( offset==ii*(4+pPager->pageSize) );
         1850  +    for(ii=pSavepoint->iSubRec; rc==SQLITE_OK&&ii<(u32)pPager->stmtNRec; ii++){
         1851  +      assert( offset == ii*(4+pPager->pageSize) );
  2167   1852         rc = pager_playback_one_page(pPager, 0, &offset, 1, pDone);
         1853  +      assert( rc!=SQLITE_DONE );
  2168   1854       }
  2169         -    assert( rc!=SQLITE_DONE );
  2170   1855     }
  2171   1856   
  2172   1857     sqlite3BitvecDestroy(pDone);
  2173   1858     if( rc==SQLITE_OK ){
  2174   1859       pPager->journalOff = szJ;
  2175   1860     }
  2176   1861     return rc;
................................................................................
  2224   1909   ** testing and analysis only.  
  2225   1910   */
  2226   1911   #ifdef SQLITE_TEST
  2227   1912   int sqlite3_opentemp_count = 0;
  2228   1913   #endif
  2229   1914   
  2230   1915   /*
  2231         -** Open a temporary file.
         1916  +** Open a temporary file. 
  2232   1917   **
  2233         -** Write the file descriptor into *pFile. Return SQLITE_OK on success 
  2234         -** or some other error code if we fail. The OS will automatically 
  2235         -** delete the temporary file when it is closed.
  2236         -**
  2237         -** The flags passed to the VFS layer xOpen() call are those specified
  2238         -** by parameter vfsFlags ORed with the following:
  2239         -**
  2240         -**     SQLITE_OPEN_READWRITE
  2241         -**     SQLITE_OPEN_CREATE
  2242         -**     SQLITE_OPEN_EXCLUSIVE
  2243         -**     SQLITE_OPEN_DELETEONCLOSE
         1918  +** Write the file descriptor into *fd.  Return SQLITE_OK on success or some
         1919  +** other error code if we fail. The OS will automatically delete the temporary
         1920  +** file when it is closed.
  2244   1921   */
  2245         -static int pagerOpentemp(
         1922  +static int sqlite3PagerOpentemp(
  2246   1923     Pager *pPager,        /* The pager object */
  2247   1924     sqlite3_file *pFile,  /* Write the file descriptor here */
  2248   1925     int vfsFlags          /* Flags passed through to the VFS */
  2249   1926   ){
  2250         -  int rc;               /* Return code */
         1927  +  int rc;
  2251   1928   
  2252   1929   #ifdef SQLITE_TEST
  2253   1930     sqlite3_opentemp_count++;  /* Used for testing and analysis only */
  2254   1931   #endif
  2255   1932   
  2256   1933     vfsFlags |=  SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE |
  2257   1934               SQLITE_OPEN_EXCLUSIVE | SQLITE_OPEN_DELETEONCLOSE;
  2258   1935     rc = sqlite3OsOpen(pPager->pVfs, 0, pFile, vfsFlags, 0);
  2259         -  assert( rc!=SQLITE_OK || isOpen(pFile) );
         1936  +  assert( rc!=SQLITE_OK || pFile->pMethods );
  2260   1937     return rc;
  2261   1938   }
  2262   1939   
  2263   1940   static int pagerStress(void *,PgHdr *);
  2264   1941   
  2265   1942   /*
  2266         -** Allocate and initialize a new Pager object and put a pointer to it
  2267         -** in *ppPager. The pager should eventually be freed by passing it
  2268         -** to sqlite3PagerClose().
         1943  +** Create a new page cache and put a pointer to the page cache in *ppPager.
         1944  +** The file to be cached need not exist.  The file is not locked until
         1945  +** the first call to sqlite3PagerGet() and is only held open until the
         1946  +** last page is released using sqlite3PagerUnref().
  2269   1947   **
  2270         -** The zFilename argument is the path to the database file to open.
  2271   1948   ** If zFilename is NULL then a randomly-named temporary file is created
  2272         -** and used as the file to be cached. Temporary files are be deleted
  2273         -** automatically when they are closed. If zFilename is ":memory:" then 
  2274         -** all information is held in cache. It is never written to disk. 
  2275         -** This can be used to implement an in-memory database.
         1949  +** and used as the file to be cached.  The file will be deleted
         1950  +** automatically when it is closed.
  2276   1951   **
  2277         -** The nExtra parameter specifies the number of bytes of space allocated
  2278         -** along with each page reference. This space is available to the user
  2279         -** via the sqlite3PagerGetExtra() API.
  2280         -**
  2281         -** The flags argument is used to specify properties that affect the
  2282         -** operation of the pager. It should be passed some bitwise combination
  2283         -** of the PAGER_OMIT_JOURNAL and PAGER_NO_READLOCK flags.
  2284         -**
  2285         -** The vfsFlags parameter is a bitmask to pass to the flags parameter
  2286         -** of the xOpen() method of the supplied VFS when opening files. 
  2287         -**
  2288         -** If the pager object is allocated and the specified file opened 
  2289         -** successfully, SQLITE_OK is returned and *ppPager set to point to
  2290         -** the new pager object. If an error occurs, *ppPager is set to NULL
  2291         -** and error code returned. This function may return SQLITE_NOMEM
  2292         -** (sqlite3Malloc() is used to allocate memory), SQLITE_CANTOPEN or 
  2293         -** various SQLITE_IO_XXX errors.
         1952  +** If zFilename is ":memory:" then all information is held in cache.
         1953  +** It is never written to disk.  This can be used to implement an
         1954  +** in-memory database.
  2294   1955   */
  2295   1956   int sqlite3PagerOpen(
  2296   1957     sqlite3_vfs *pVfs,       /* The virtual file system to use */
  2297         -  Pager **ppPager,         /* OUT: Return the Pager structure here */
         1958  +  Pager **ppPager,         /* Return the Pager structure here */
  2298   1959     const char *zFilename,   /* Name of the database file to open */
  2299   1960     int nExtra,              /* Extra bytes append to each in-memory page */
  2300   1961     int flags,               /* flags controlling this file */
  2301   1962     int vfsFlags             /* flags passed through to sqlite3_vfs.xOpen() */
  2302   1963   ){
  2303   1964     u8 *pPtr;
  2304         -  Pager *pPager = 0;       /* Pager object to allocate and return */
  2305         -  int rc = SQLITE_OK;      /* Return code */
  2306         -  int tempFile = 0;        /* True for temp files (incl. in-memory files) */
  2307         -  int memDb = 0;           /* True if this is an in-memory file */
  2308         -  int readOnly = 0;        /* True if this is a read-only file */
  2309         -  int journalFileSize;     /* Bytes to allocate for each journal fd */
  2310         -  char *zPathname = 0;     /* Full path to database file */
  2311         -  int nPathname = 0;       /* Number of bytes in zPathname */
  2312         -  int useJournal = (flags & PAGER_OMIT_JOURNAL)==0; /* False to omit journal */
  2313         -  int noReadlock = (flags & PAGER_NO_READLOCK)!=0;  /* True to omit read-lock */
  2314         -  int pcacheSize = sqlite3PcacheSize();       /* Bytes to allocate for PCache */
  2315         -  u16 szPageDflt = SQLITE_DEFAULT_PAGE_SIZE;  /* Default page size */
         1965  +  Pager *pPager = 0;
         1966  +  int rc = SQLITE_OK;
         1967  +  int i;
         1968  +  int tempFile = 0;
         1969  +  int memDb = 0;
         1970  +  int readOnly = 0;
         1971  +  int useJournal = (flags & PAGER_OMIT_JOURNAL)==0;
         1972  +  int noReadlock = (flags & PAGER_NO_READLOCK)!=0;
         1973  +  int journalFileSize;
         1974  +  int pcacheSize = sqlite3PcacheSize();
         1975  +  int szPageDflt = SQLITE_DEFAULT_PAGE_SIZE;
         1976  +  char *zPathname = 0;
         1977  +  int nPathname = 0;
  2316   1978   
  2317         -  /* Figure out how much space is required for each journal file-handle
  2318         -  ** (there are two of them, the main journal and the sub-journal). This
  2319         -  ** is the maximum space required for an in-memory journal file handle 
  2320         -  ** and a regular journal file-handle. Note that a "regular journal-handle"
  2321         -  ** may be a wrapper capable of caching the first portion of the journal
  2322         -  ** file in memory to implement the atomic-write optimization (see 
  2323         -  ** source file journal.c).
  2324         -  */
  2325   1979     if( sqlite3JournalSize(pVfs)>sqlite3MemJournalSize() ){
  2326   1980       journalFileSize = sqlite3JournalSize(pVfs);
  2327   1981     }else{
  2328   1982       journalFileSize = sqlite3MemJournalSize();
  2329   1983     }
  2330   1984   
  2331         -  /* Set the output variable to NULL in case an error occurs. */
         1985  +  /* The default return is a NULL pointer */
  2332   1986     *ppPager = 0;
  2333   1987   
  2334   1988     /* Compute and store the full pathname in an allocated buffer pointed
  2335   1989     ** to by zPathname, length nPathname. Or, if this is a temporary file,
  2336   1990     ** leave both nPathname and zPathname set to 0.
  2337   1991     */
  2338   1992     if( zFilename && zFilename[0] ){
................................................................................
  2346   2000         memDb = 1;
  2347   2001         zPathname[0] = 0;
  2348   2002       }else
  2349   2003   #endif
  2350   2004       {
  2351   2005         rc = sqlite3OsFullPathname(pVfs, zFilename, nPathname, zPathname);
  2352   2006       }
  2353         -
  2354         -    nPathname = sqlite3Strlen30(zPathname);
  2355         -    if( rc==SQLITE_OK && nPathname+8>pVfs->mxPathname ){
  2356         -      /* This branch is taken when the journal path required by
  2357         -      ** the database being opened will be more than pVfs->mxPathname
  2358         -      ** bytes in length. This means the database cannot be opened,
  2359         -      ** as it will not be possible to open the journal file or even
  2360         -      ** check for a hot-journal before reading.
  2361         -      */
  2362         -      rc = SQLITE_CANTOPEN;
  2363         -    }
  2364   2007       if( rc!=SQLITE_OK ){
  2365   2008         sqlite3_free(zPathname);
  2366   2009         return rc;
  2367   2010       }
         2011  +    nPathname = sqlite3Strlen30(zPathname);
  2368   2012     }
  2369   2013   
  2370         -  /* Allocate memory for the Pager structure, PCache object, the
  2371         -  ** three file descriptors, the database file name and the journal 
  2372         -  ** file name. The layout in memory is as follows:
  2373         -  **
  2374         -  **     Pager object                    (sizeof(Pager) bytes)
  2375         -  **     PCache object                   (sqlite3PcacheSize() bytes)
  2376         -  **     Database file handle            (pVfs->szOsFile bytes)
  2377         -  **     Sub-journal file handle         (journalFileSize bytes)
  2378         -  **     Main journal file handle        (journalFileSize bytes)
  2379         -  **     Database file name              (nPathname+1 bytes)
  2380         -  **     Journal file name               (nPathname+8+1 bytes)
  2381         -  */
  2382         -  pPtr = (u8 *)sqlite3MallocZero(
         2014  +  /* Allocate memory for the pager structure */
         2015  +  pPager = sqlite3MallocZero(
  2383   2016       sizeof(*pPager) +           /* Pager structure */
  2384   2017       pcacheSize      +           /* PCache object */
         2018  +    journalFileSize +           /* The journal file structure */ 
  2385   2019       pVfs->szOsFile  +           /* The main db file */
  2386   2020       journalFileSize * 2 +       /* The two journal files */ 
  2387         -    nPathname + 1 +             /* zFilename */
  2388         -    nPathname + 8 + 1           /* zJournal */
         2021  +    3*nPathname + 40            /* zFilename, zDirectory, zJournal */
  2389   2022     );
  2390         -  if( !pPtr ){
         2023  +  if( !pPager ){
  2391   2024       sqlite3_free(zPathname);
  2392   2025       return SQLITE_NOMEM;
  2393   2026     }
  2394         -  pPager =              (Pager*)(pPtr);
  2395         -  pPager->pPCache =    (PCache*)(pPtr += sizeof(*pPager));
  2396         -  pPager->fd =   (sqlite3_file*)(pPtr += pcacheSize);
  2397         -  pPager->sjfd = (sqlite3_file*)(pPtr += pVfs->szOsFile);
  2398         -  pPager->jfd =  (sqlite3_file*)(pPtr += journalFileSize);
  2399         -  pPager->zFilename =    (char*)(pPtr += journalFileSize);
  2400         -
  2401         -  /* Fill in the Pager.zFilename and Pager.zJournal buffers, if required. */
         2027  +  pPager->pPCache = (PCache *)&pPager[1];
         2028  +  pPtr = ((u8 *)&pPager[1]) + pcacheSize;
         2029  +  pPager->vfsFlags = vfsFlags;
         2030  +  pPager->fd = (sqlite3_file*)&pPtr[pVfs->szOsFile*0];
         2031  +  pPager->sjfd = (sqlite3_file*)&pPtr[pVfs->szOsFile];
         2032  +  pPager->jfd = (sqlite3_file*)&pPtr[pVfs->szOsFile+journalFileSize];
         2033  +  pPager->zFilename = (char*)&pPtr[pVfs->szOsFile+2*journalFileSize];
         2034  +  pPager->zDirectory = &pPager->zFilename[nPathname+1];
         2035  +  pPager->zJournal = &pPager->zDirectory[nPathname+1];
         2036  +  pPager->pVfs = pVfs;
  2402   2037     if( zPathname ){
  2403         -    pPager->zJournal =   (char*)(pPtr += nPathname + 1);
  2404         -    memcpy(pPager->zFilename, zPathname, nPathname);
  2405         -    memcpy(pPager->zJournal, zPathname, nPathname);
  2406         -    memcpy(&pPager->zJournal[nPathname], "-journal", 8);
         2038  +    memcpy(pPager->zFilename, zPathname, nPathname+1);
  2407   2039       sqlite3_free(zPathname);
  2408   2040     }
  2409         -  pPager->pVfs = pVfs;
  2410         -  pPager->vfsFlags = vfsFlags;
  2411   2041   
  2412   2042     /* Open the pager file.
  2413   2043     */
  2414   2044     if( zFilename && zFilename[0] && !memDb ){
  2415         -    int fout = 0;                    /* VFS flags returned by xOpen() */
  2416         -    rc = sqlite3OsOpen(pVfs, pPager->zFilename, pPager->fd, vfsFlags, &fout);
  2417         -    readOnly = (fout&SQLITE_OPEN_READONLY);
         2045  +    if( nPathname>(pVfs->mxPathname - (int)sizeof("-journal")) ){
         2046  +      rc = SQLITE_CANTOPEN;
         2047  +    }else{
         2048  +      int fout = 0;
         2049  +      rc = sqlite3OsOpen(pVfs, pPager->zFilename, pPager->fd,
         2050  +                         pPager->vfsFlags, &fout);
         2051  +      readOnly = (fout&SQLITE_OPEN_READONLY);
  2418   2052   
  2419         -    /* If the file was successfully opened for read/write access,
  2420         -    ** choose a default page size in case we have to create the
  2421         -    ** database file. The default page size is the maximum of:
  2422         -    **
  2423         -    **    + SQLITE_DEFAULT_PAGE_SIZE,
  2424         -    **    + The value returned by sqlite3OsSectorSize()
  2425         -    **    + The largest page size that can be written atomically.
  2426         -    */
  2427         -    if( rc==SQLITE_OK && !readOnly ){
  2428         -      setSectorSize(pPager);
  2429         -      if( szPageDflt<pPager->sectorSize ){
  2430         -        szPageDflt = pPager->sectorSize;
  2431         -      }
         2053  +      /* If the file was successfully opened for read/write access,
         2054  +      ** choose a default page size in case we have to create the
         2055  +      ** database file. The default page size is the maximum of:
         2056  +      **
         2057  +      **    + SQLITE_DEFAULT_PAGE_SIZE,
         2058  +      **    + The value returned by sqlite3OsSectorSize()
         2059  +      **    + The largest page size that can be written atomically.
         2060  +      */
         2061  +      if( rc==SQLITE_OK && !readOnly ){
         2062  +        setSectorSize(pPager);
         2063  +        if( szPageDflt<pPager->sectorSize ){
         2064  +          szPageDflt = pPager->sectorSize;
         2065  +        }
  2432   2066   #ifdef SQLITE_ENABLE_ATOMIC_WRITE
  2433         -      {
  2434         -        int iDc = sqlite3OsDeviceCharacteristics(pPager->fd);
  2435         -        int ii;
  2436         -        assert(SQLITE_IOCAP_ATOMIC512==(512>>8));
  2437         -        assert(SQLITE_IOCAP_ATOMIC64K==(65536>>8));
  2438         -        assert(SQLITE_MAX_DEFAULT_PAGE_SIZE<=65536);
  2439         -        for(ii=szPageDflt; ii<=SQLITE_MAX_DEFAULT_PAGE_SIZE; ii=ii*2){
  2440         -          if( iDc&(SQLITE_IOCAP_ATOMIC|(ii>>8)) ){
  2441         -            szPageDflt = ii;
         2067  +        {
         2068  +          int iDc = sqlite3OsDeviceCharacteristics(pPager->fd);
         2069  +          int ii;
         2070  +          assert(SQLITE_IOCAP_ATOMIC512==(512>>8));
         2071  +          assert(SQLITE_IOCAP_ATOMIC64K==(65536>>8));
         2072  +          assert(SQLITE_MAX_DEFAULT_PAGE_SIZE<=65536);
         2073  +          for(ii=szPageDflt; ii<=SQLITE_MAX_DEFAULT_PAGE_SIZE; ii=ii*2){
         2074  +            if( iDc&(SQLITE_IOCAP_ATOMIC|(ii>>8)) ) szPageDflt = ii;
  2442   2075             }
  2443   2076           }
  2444         -      }
  2445   2077   #endif
  2446         -      if( szPageDflt>SQLITE_MAX_DEFAULT_PAGE_SIZE ){
  2447         -        szPageDflt = SQLITE_MAX_DEFAULT_PAGE_SIZE;
         2078  +        if( szPageDflt>SQLITE_MAX_DEFAULT_PAGE_SIZE ){
         2079  +          szPageDflt = SQLITE_MAX_DEFAULT_PAGE_SIZE;
         2080  +        }
  2448   2081         }
  2449   2082       }
  2450   2083     }else{
  2451   2084       /* If a temporary file is requested, it is not opened immediately.
  2452   2085       ** In this case we accept the default page size and delay actually
  2453   2086       ** opening the file until the first call to OsWrite().
  2454   2087       **
................................................................................
  2456   2089       ** database is the same as a temp-file that is never written out to
  2457   2090       ** disk and uses an in-memory rollback journal.
  2458   2091       */ 
  2459   2092       tempFile = 1;
  2460   2093       pPager->state = PAGER_EXCLUSIVE;
  2461   2094     }
  2462   2095   
  2463         -  /* The following call to PagerSetPagesize() serves to set the value of 
  2464         -  ** Pager.pageSize and to allocate the Pager.pTmpSpace buffer.
  2465         -  */
  2466         -  if( rc==SQLITE_OK ){
  2467         -    assert( pPager->memDb==0 );
  2468         -    rc = sqlite3PagerSetPagesize(pPager, &szPageDflt);
  2469         -    testcase( rc!=SQLITE_OK );
         2096  +  if( pPager && rc==SQLITE_OK ){
         2097  +    pPager->pTmpSpace = sqlite3PageMalloc(szPageDflt);
  2470   2098     }
  2471   2099   
  2472         -  /* If an error occured in either of the blocks above, free the 
  2473         -  ** Pager structure and close the file.
         2100  +  /* If an error occured in either of the blocks above.
         2101  +  ** Free the Pager structure and close the file.
         2102  +  ** Since the pager is not allocated there is no need to set 
         2103  +  ** any Pager.errMask variables.
  2474   2104     */
  2475         -  if( rc!=SQLITE_OK ){
  2476         -    assert( !pPager->pTmpSpace );
         2105  +  if( !pPager || !pPager->pTmpSpace ){
  2477   2106       sqlite3OsClose(pPager->fd);
  2478   2107       sqlite3_free(pPager);
  2479         -    return rc;
         2108  +    return ((rc==SQLITE_OK)?SQLITE_NOMEM:rc);
  2480   2109     }
  2481         -
  2482         -  /* Initialize the PCache object. */
  2483   2110     nExtra = FORCE_ALIGNMENT(nExtra);
  2484   2111     sqlite3PcacheOpen(szPageDflt, nExtra, !memDb,
  2485   2112                       !memDb?pagerStress:0, (void *)pPager, pPager->pPCache);
  2486   2113   
  2487   2114     PAGERTRACE(("OPEN %d %s\n", FILEHANDLEID(pPager->fd), pPager->zFilename));
  2488   2115     IOTRACE(("OPEN %p %s\n", pPager, pPager->zFilename))
  2489   2116   
         2117  +  /* Fill in Pager.zDirectory[] */
         2118  +  memcpy(pPager->zDirectory, pPager->zFilename, nPathname+1);
         2119  +  for(i=sqlite3Strlen30(pPager->zDirectory); 
         2120  +      i>0 && pPager->zDirectory[i-1]!='/'; i--){}
         2121  +  if( i>0 ) pPager->zDirectory[i-1] = 0;
         2122  +
         2123  +  /* Fill in Pager.zJournal[] */
         2124  +  if( zPathname ){
         2125  +    memcpy(pPager->zJournal, pPager->zFilename, nPathname);
         2126  +    memcpy(&pPager->zJournal[nPathname], "-journal", 9);
         2127  +  }else{
         2128  +    pPager->zJournal = 0;
         2129  +  }
         2130  +
         2131  +  /* pPager->journalOpen = 0; */
  2490   2132     pPager->useJournal = (u8)useJournal;
  2491   2133     pPager->noReadlock = (noReadlock && readOnly) ?1:0;
  2492   2134     /* pPager->stmtOpen = 0; */
  2493   2135     /* pPager->stmtInUse = 0; */
  2494   2136     /* pPager->nRef = 0; */
  2495   2137     pPager->dbSizeValid = (u8)memDb;
         2138  +  pPager->pageSize = szPageDflt;
  2496   2139     /* pPager->stmtSize = 0; */
  2497   2140     /* pPager->stmtJSize = 0; */
  2498   2141     /* pPager->nPage = 0; */
         2142  +  pPager->mxPage = 100;
  2499   2143     pPager->mxPgno = SQLITE_MAX_PAGE_COUNT;
  2500   2144     /* pPager->state = PAGER_UNLOCK; */
  2501   2145     assert( pPager->state == (tempFile ? PAGER_EXCLUSIVE : PAGER_UNLOCK) );
  2502   2146     /* pPager->errMask = 0; */
  2503   2147     pPager->tempFile = (u8)tempFile;
  2504   2148     assert( tempFile==PAGER_LOCKINGMODE_NORMAL 
  2505   2149             || tempFile==PAGER_LOCKINGMODE_EXCLUSIVE );
................................................................................
  2512   2156     pPager->fullSync = pPager->noSync ?0:1;
  2513   2157     pPager->sync_flags = SQLITE_SYNC_NORMAL;
  2514   2158     /* pPager->pFirst = 0; */
  2515   2159     /* pPager->pFirstSynced = 0; */
  2516   2160     /* pPager->pLast = 0; */
  2517   2161     pPager->nExtra = nExtra;
  2518   2162     pPager->journalSizeLimit = SQLITE_DEFAULT_JOURNAL_SIZE_LIMIT;
  2519         -  assert( isOpen(pPager->fd) || tempFile );
         2163  +  assert(pPager->fd->pMethods||tempFile);
  2520   2164     setSectorSize(pPager);
  2521   2165     if( memDb ){
  2522   2166       pPager->journalMode = PAGER_JOURNALMODE_MEMORY;
  2523   2167     }
  2524   2168     /* pPager->xBusyHandler = 0; */
  2525   2169     /* pPager->pBusyHandlerArg = 0; */
  2526   2170     /* memset(pPager->aHash, 0, sizeof(pPager->aHash)); */
  2527   2171     *ppPager = pPager;
  2528   2172     return SQLITE_OK;
  2529   2173   }
  2530   2174   
  2531   2175   /*
  2532   2176   ** Set the busy handler function.
  2533         -**
  2534         -** The pager invokes the busy-handler if sqlite3OsLock() returns 
  2535         -** SQLITE_BUSY when trying to upgrade from no-lock to a SHARED lock,
  2536         -** or when trying to upgrade from a RESERVED lock to an EXCLUSIVE 
  2537         -** lock. It does *not* invoke the busy handler when upgrading from
  2538         -** SHARED to RESERVED, or when upgrading from SHARED to EXCLUSIVE
  2539         -** (which occurs during hot-journal rollback). Summary:
  2540         -**
  2541         -**   Transition                        | Invokes xBusyHandler
  2542         -**   --------------------------------------------------------
  2543         -**   NO_LOCK       -> SHARED_LOCK      | Yes
  2544         -**   SHARED_LOCK   -> RESERVED_LOCK    | No
  2545         -**   SHARED_LOCK   -> EXCLUSIVE_LOCK   | No
  2546         -**   RESERVED_LOCK -> EXCLUSIVE_LOCK   | Yes
  2547         -**
  2548         -** If the busy-handler callback returns non-zero, the lock is 
  2549         -** retried. If it returns zero, then the SQLITE_BUSY error is
  2550         -** returned to the caller of the pager API function.
  2551   2177   */
  2552   2178   void sqlite3PagerSetBusyhandler(
  2553         -  Pager *pPager,                       /* Pager object */
  2554         -  int (*xBusyHandler)(void *),         /* Pointer to busy-handler function */
  2555         -  void *pBusyHandlerArg                /* Argument to pass to xBusyHandler */
         2179  +  Pager *pPager, 
         2180  +  int (*xBusyHandler)(void *),
         2181  +  void *pBusyHandlerArg
  2556   2182   ){  
  2557   2183     pPager->xBusyHandler = xBusyHandler;
  2558   2184     pPager->pBusyHandlerArg = pBusyHandlerArg;
  2559   2185   }
  2560   2186   
  2561   2187   /*
  2562         -** Set the reinitializer for this pager. If not NULL, the reinitializer
  2563         -** is called when the content of a page in cache is modified (restored)
  2564         -** as part of a transaction or savepoint rollback. The callback gives 
  2565         -** higher-level code an opportunity to restore the EXTRA section to 
  2566         -** agree with the restored page data.
         2188  +** Set the reinitializer for this pager.  If not NULL, the reinitializer
         2189  +** is called when the content of a page in cache is restored to its original
         2190  +** value as a result of a rollback.  The callback gives higher-level code
         2191  +** an opportunity to restore the EXTRA section to agree with the restored
         2192  +** page data.
  2567   2193   */
  2568   2194   void sqlite3PagerSetReiniter(Pager *pPager, void (*xReinit)(DbPage*)){
  2569   2195     pPager->xReiniter = xReinit;
  2570   2196   }
  2571   2197   
  2572   2198   /*
  2573         -** Change the page size used by the Pager object. The new page size 
  2574         -** is passed in *pPageSize.
  2575         -**
  2576         -** If the pager is in the error state when this function is called, it
  2577         -** is a no-op. The value returned is the error state error code (i.e. 
  2578         -** one of SQLITE_IOERR, SQLITE_CORRUPT or SQLITE_FULL).
  2579         -**
  2580         -** Otherwise, if all of the following are true:
  2581         -**
  2582         -**   * the new page size (value of *pPageSize) is valid (a power 
  2583         -**     of two between 512 and SQLITE_MAX_PAGE_SIZE, inclusive), and
  2584         -**
  2585         -**   * there are no outstanding page references, and
  2586         -**
  2587         -**   * the database is either not an in-memory database or it is
  2588         -**     an in-memory database that currently consists of zero pages.
  2589         -**
  2590         -** then the pager object page size is set to *pPageSize.
  2591         -**
  2592         -** If the page size is changed, then this function uses sqlite3PagerMalloc() 
  2593         -** to obtain a new Pager.pTmpSpace buffer. If this allocation attempt 
  2594         -** fails, SQLITE_NOMEM is returned and the page size remains unchanged. 
  2595         -** In all other cases, SQLITE_OK is returned.
  2596         -**
  2597         -** If the page size is not changed, either because one of the enumerated
  2598         -** conditions above is not true, the pager was in error state when this
  2599         -** function was called, or because the memory allocation attempt failed, 
  2600         -** then *pPageSize is set to the old, retained page size before returning.
         2199  +** Set the page size to *pPageSize. If the suggest new page size is
         2200  +** inappropriate, then an alternative page size is set to that
         2201  +** value before returning.
  2601   2202   */
  2602   2203   int sqlite3PagerSetPagesize(Pager *pPager, u16 *pPageSize){
  2603   2204     int rc = pPager->errCode;
  2604   2205     if( rc==SQLITE_OK ){
  2605   2206       u16 pageSize = *pPageSize;
  2606   2207       assert( pageSize==0 || (pageSize>=512 && pageSize<=SQLITE_MAX_PAGE_SIZE) );
  2607   2208       if( pageSize && pageSize!=pPager->pageSize 
................................................................................
  2610   2211       ){
  2611   2212         char *pNew = (char *)sqlite3PageMalloc(pageSize);
  2612   2213         if( !pNew ){
  2613   2214           rc = SQLITE_NOMEM;
  2614   2215         }else{
  2615   2216           pager_reset(pPager);
  2616   2217           pPager->pageSize = pageSize;
         2218  +        if( !pPager->memDb ) setSectorSize(pPager);
  2617   2219           sqlite3PageFree(pPager->pTmpSpace);
  2618   2220           pPager->pTmpSpace = pNew;
  2619   2221           sqlite3PcacheSetPageSize(pPager->pPCache, pageSize);
  2620   2222         }
  2621   2223       }
  2622   2224       *pPageSize = (u16)pPager->pageSize;
  2623   2225     }
................................................................................
  2675   2277   # define enable_simulated_io_errors()
  2676   2278   #endif
  2677   2279   
  2678   2280   /*
  2679   2281   ** Read the first N bytes from the beginning of the file into memory
  2680   2282   ** that pDest points to. 
  2681   2283   **
  2682         -** If the pager was opened on a transient file (zFilename==""), or
  2683         -** opened on a file less than N bytes in size, the output buffer is
  2684         -** zeroed and SQLITE_OK returned. The rationale for this is that this 
  2685         -** function is used to read database headers, and a new transient or
  2686         -** zero sized database has a header than consists entirely of zeroes.
  2687         -**
  2688         -** If any IO error apart from SQLITE_IOERR_SHORT_READ is encountered,
  2689         -** the error code is returned to the caller and the contents of the
  2690         -** output buffer undefined.
         2284  +** No error checking is done. The rational for this is that this function 
         2285  +** may be called even if the file does not exist or contain a header. In 
         2286  +** these cases sqlite3OsRead() will return an error, to which the correct 
         2287  +** response is to zero the memory at pDest and continue.  A real IO error 
         2288  +** will presumably recur and be picked up later (Todo: Think about this).
  2691   2289   */
  2692   2290   int sqlite3PagerReadFileheader(Pager *pPager, int N, unsigned char *pDest){
  2693   2291     int rc = SQLITE_OK;
  2694   2292     memset(pDest, 0, N);
  2695         -  assert( isOpen(pPager->fd) || pPager->tempFile );
  2696         -  if( isOpen(pPager->fd) ){
         2293  +  assert(pPager->fd->pMethods||pPager->tempFile);
         2294  +  if( pPager->fd->pMethods ){
  2697   2295       IOTRACE(("DBHDR %p 0 %d\n", pPager, N))
  2698   2296       rc = sqlite3OsRead(pPager->fd, pDest, N, 0);
  2699   2297       if( rc==SQLITE_IOERR_SHORT_READ ){
  2700   2298         rc = SQLITE_OK;
  2701   2299       }
  2702   2300     }
  2703   2301     return rc;
  2704   2302   }
  2705   2303   
  2706   2304   /*
  2707         -** Return the total number of pages in the database file associated 
  2708         -** with pPager. Normally, this is calculated as (<db file size>/<page-size>).
  2709         -** However, if the file is between 1 and <page-size> bytes in size, then 
  2710         -** this is considered a 1 page file.
         2305  +** Return the total number of pages in the disk file associated with
         2306  +** pPager. 
  2711   2307   **
  2712         -** If the pager is in error state when this function is called, then the
  2713         -** error state error code is returned and *pnPage left unchanged. Or,
  2714         -** if the file system has to be queried for the size of the file and
  2715         -** the query attempt returns an IO error, the IO error code is returned
  2716         -** and *pnPage is left unchanged.
  2717         -**
  2718         -** Otherwise, if everything is successful, then SQLITE_OK is returned
  2719         -** and *pnPage is set to the number of pages in the database.
         2308  +** If the PENDING_BYTE lies on the page directly after the end of the
         2309  +** file, then consider this page part of the file too. For example, if
         2310  +** PENDING_BYTE is byte 4096 (the first byte of page 5) and the size of the
         2311  +** file is 4096 bytes, 5 is returned instead of 4.
  2720   2312   */
  2721   2313   int sqlite3PagerPagecount(Pager *pPager, int *pnPage){
  2722         -  Pgno nPage;               /* Value to return via *pnPage */
  2723         -
  2724         -  /* If the pager is already in the error state, return the error code. */
         2314  +  i64 n = 0;
         2315  +  int rc;
         2316  +  assert( pPager!=0 );
  2725   2317     if( pPager->errCode ){
  2726         -    return pPager->errCode;
         2318  +    rc = pPager->errCode;
         2319  +    return rc;
  2727   2320     }
  2728         -
  2729         -  /* Determine the number of pages in the file. Store this in nPage. */
  2730   2321     if( pPager->dbSizeValid ){
  2731         -    nPage = pPager->dbSize;
  2732         -  }else{
  2733         -    int rc;                 /* Error returned by OsFileSize() */
  2734         -    i64 n = 0;              /* File size in bytes returned by OsFileSize() */
  2735         -
  2736         -    assert( isOpen(pPager->fd) || pPager->tempFile );
  2737         -    if( isOpen(pPager->fd) && (rc = sqlite3OsFileSize(pPager->fd, &n)) ){
         2322  +    n = pPager->dbSize;
         2323  +  } else {
         2324  +    assert(pPager->fd->pMethods||pPager->tempFile);
         2325  +    if( (pPager->fd->pMethods)
         2326  +     && (rc = sqlite3OsFileSize(pPager->fd, &n))!=SQLITE_OK ){
  2738   2327         pager_error(pPager, rc);
  2739   2328         return rc;
  2740   2329       }
  2741   2330       if( n>0 && n<pPager->pageSize ){
  2742         -      nPage = 1;
         2331  +      n = 1;
  2743   2332       }else{
  2744         -      nPage = n / pPager->pageSize;
         2333  +      n /= pPager->pageSize;
  2745   2334       }
  2746   2335       if( pPager->state!=PAGER_UNLOCK ){
  2747         -      pPager->dbSize = (Pgno)nPage;
  2748         -      pPager->dbFileSize = (Pgno)nPage;
         2336  +      pPager->dbSize = (Pgno)n;
         2337  +      pPager->dbFileSize = (Pgno)n;
  2749   2338         pPager->dbSizeValid = 1;
  2750   2339       }
  2751   2340     }
  2752         -
  2753         -  /* If the current number of pages in the file is greater than the 
  2754         -  ** configured maximum pager number, increase the allowed limit so
  2755         -  ** that the file can be read.
  2756         -  */
  2757         -  if( nPage>pPager->mxPgno ){
  2758         -    pPager->mxPgno = (Pgno)nPage;
         2341  +  if( n==(PENDING_BYTE/pPager->pageSize) ){
         2342  +    n++;
  2759   2343     }
  2760         -
  2761         -  /* Set the output variable and return SQLITE_OK */
         2344  +  if( n>pPager->mxPgno ){
         2345  +    pPager->mxPgno = (Pgno)n;
         2346  +  }
  2762   2347     if( pnPage ){
  2763         -    *pnPage = nPage;
         2348  +    *pnPage = (int)n;
  2764   2349     }
  2765   2350     return SQLITE_OK;
  2766   2351   }
  2767   2352   
  2768   2353   /*
  2769         -** Forward declaration.
         2354  +** Forward declaration
  2770   2355   */
  2771   2356   static int syncJournal(Pager*);
  2772   2357   
  2773   2358   /*
  2774         -** Try to obtain a lock of type locktype on the database file. If
  2775         -** a similar or greater lock is already held, this function is a no-op
  2776         -** (returning SQLITE_OK immediately).
  2777         -**
  2778         -** Otherwise, attempt to obtain the lock using sqlite3OsLock(). Invoke 
  2779         -** the busy callback if the lock is currently not available. Repeat 
  2780         -** until the busy callback returns false or until the attempt to 
  2781         -** obtain the lock succeeds.
         2359  +** Try to obtain a lock on a file.  Invoke the busy callback if the lock
         2360  +** is currently not available.  Repeat until the busy callback returns
         2361  +** false or until the lock succeeds.
  2782   2362   **
  2783   2363   ** Return SQLITE_OK on success and an error code if we cannot obtain
  2784         -** the lock. If the lock is obtained successfully, set the Pager.state 
  2785         -** variable to locktype before returning.
         2364  +** the lock.
  2786   2365   */
  2787   2366   static int pager_wait_on_lock(Pager *pPager, int locktype){
  2788         -  int rc;                              /* Return code */
         2367  +  int rc;
  2789   2368   
  2790   2369     /* The OS lock values must be the same as the Pager lock values */
  2791   2370     assert( PAGER_SHARED==SHARED_LOCK );
  2792   2371     assert( PAGER_RESERVED==RESERVED_LOCK );
  2793   2372     assert( PAGER_EXCLUSIVE==EXCLUSIVE_LOCK );
  2794   2373   
  2795   2374     /* If the file is currently unlocked then the size must be unknown */
  2796   2375     assert( pPager->state>=PAGER_SHARED || pPager->dbSizeValid==0 );
  2797   2376   
  2798         -  /* Check that this is either a no-op (because the requested lock is 
  2799         -  ** already held, or one of the transistions that the busy-handler
  2800         -  ** may be invoked during, according to the comment above
  2801         -  ** sqlite3PagerSetBusyhandler().
  2802         -  */
  2803         -  assert( (pPager->state>=locktype)
  2804         -       || (pPager->state==PAGER_UNLOCK && locktype==PAGER_SHARED)
  2805         -       || (pPager->state==PAGER_RESERVED && locktype==PAGER_EXCLUSIVE)
  2806         -  );
  2807         -
  2808   2377     if( pPager->state>=locktype ){
  2809   2378       rc = SQLITE_OK;
  2810   2379     }else{
  2811   2380       do {
  2812   2381         rc = sqlite3OsLock(pPager->fd, locktype);
  2813   2382       }while( rc==SQLITE_BUSY && pPager->xBusyHandler(pPager->pBusyHandlerArg) );
  2814   2383       if( rc==SQLITE_OK ){
................................................................................
  2825   2394   ** function does not actually modify the database file on disk. It 
  2826   2395   ** just sets the internal state of the pager object so that the 
  2827   2396   ** truncation will be done when the current transaction is committed.
  2828   2397   */
  2829   2398   void sqlite3PagerTruncateImage(Pager *pPager, Pgno nPage){
  2830   2399     assert( pPager->dbSizeValid );
  2831   2400     assert( pPager->dbSize>=nPage );
  2832         -  assert( pPager->state>=PAGER_RESERVED );
  2833   2401     pPager->dbSize = nPage;
  2834   2402   }
         2403  +
         2404  +/*
         2405  +** Return the current size of the database file image in pages. This
         2406  +** function differs from sqlite3PagerPagecount() in two ways:
         2407  +**
         2408  +**  a) It may only be called when at least one reference to a database
         2409  +**     page is held. This guarantees that the database size is already
         2410  +**     known and a call to sqlite3OsFileSize() is not required.
         2411  +**
         2412  +**  b) The return value is not adjusted for the locking page.
         2413  +*/
         2414  +Pgno sqlite3PagerImageSize(Pager *pPager){
         2415  +  assert( pPager->dbSizeValid );
         2416  +  return pPager->dbSize;
         2417  +}
  2835   2418   #endif  /* ifndef SQLITE_OMIT_AUTOVACUUM */
  2836   2419   
  2837   2420   /*
  2838   2421   ** Shutdown the page cache.  Free all memory and close all files.
  2839   2422   **
  2840   2423   ** If a transaction was in progress when this routine is called, that
  2841   2424   ** transaction is rolled back.  All outstanding pages are invalidated
................................................................................
  2845   2428   **
  2846   2429   ** This function always succeeds. If a transaction is active an attempt
  2847   2430   ** is made to roll it back. If an error occurs during the rollback 
  2848   2431   ** a hot journal may be left in the filesystem but no error is returned
  2849   2432   ** to the caller.
  2850   2433   */
  2851   2434   int sqlite3PagerClose(Pager *pPager){
         2435  +
  2852   2436     disable_simulated_io_errors();
  2853   2437     sqlite3BeginBenignMalloc();
  2854   2438     pPager->errCode = 0;
  2855   2439     pPager->exclusiveMode = 0;
  2856   2440     pager_reset(pPager);
  2857         -  if( MEMDB ){
  2858         -    pager_unlock(pPager);
  2859         -  }else{
         2441  +  if( !MEMDB ){
  2860   2442       /* Set Pager.journalHdr to -1 for the benefit of the pager_playback() 
  2861   2443       ** call which may be made from within pagerUnlockAndRollback(). If it
  2862   2444       ** is not -1, then the unsynced portion of an open journal file may
  2863   2445       ** be played back into the database. If a power failure occurs while
  2864   2446       ** this is happening, the database may become corrupt.
  2865   2447       */
  2866   2448       pPager->journalHdr = -1;
  2867   2449       pagerUnlockAndRollback(pPager);
  2868   2450     }
  2869         -  sqlite3EndBenignMalloc();
  2870   2451     enable_simulated_io_errors();
         2452  +  sqlite3EndBenignMalloc();
  2871   2453     PAGERTRACE(("CLOSE %d\n", PAGERID(pPager)));
  2872   2454     IOTRACE(("CLOSE %p\n", pPager))
         2455  +  if( pPager->journalOpen ){
         2456  +    sqlite3OsClose(pPager->jfd);
         2457  +  }
         2458  +  sqlite3BitvecDestroy(pPager->pInJournal);
         2459  +  sqlite3BitvecDestroy(pPager->pAlwaysRollback);
         2460  +  releaseAllSavepoint(pPager);
  2873   2461     sqlite3OsClose(pPager->fd);
         2462  +  /* Temp files are automatically deleted by the OS
         2463  +  ** if( pPager->tempFile ){
         2464  +  **   sqlite3OsDelete(pPager->zFilename);
         2465  +  ** }
         2466  +  */
         2467  +
  2874   2468     sqlite3PageFree(pPager->pTmpSpace);
  2875   2469     sqlite3PcacheClose(pPager->pPCache);
  2876         -
  2877         -  assert( !pPager->aSavepoint && !pPager->pInJournal );
  2878         -  assert( !isOpen(pPager->jfd) && !isOpen(pPager->sjfd) );
  2879         -
  2880   2470     sqlite3_free(pPager);
  2881   2471     return SQLITE_OK;
  2882   2472   }
  2883   2473   
  2884   2474   #if !defined(NDEBUG) || defined(SQLITE_TEST)
  2885   2475   /*
  2886         -** Return the page number for page pPg.
         2476  +** Return the page number for the given page data.
  2887   2477   */
  2888         -Pgno sqlite3PagerPagenumber(DbPage *pPg){
  2889         -  return pPg->pgno;
         2478  +Pgno sqlite3PagerPagenumber(DbPage *p){
         2479  +  return p->pgno;
  2890   2480   }
  2891   2481   #endif
  2892   2482   
  2893   2483   /*
  2894         -** Increment the reference count for page pPg.
         2484  +** Increment the reference count for a page.  The input pointer is
         2485  +** a reference to the page data.
  2895   2486   */
  2896         -void sqlite3PagerRef(DbPage *pPg){
         2487  +int sqlite3PagerRef(DbPage *pPg){
  2897   2488     sqlite3PcacheRef(pPg);
         2489  +  return SQLITE_OK;
  2898   2490   }
  2899   2491   
  2900   2492   /*
  2901         -** Sync the journal. In other words, make sure all the pages that have
         2493  +** Sync the journal.  In other words, make sure all the pages that have
  2902   2494   ** been written to the journal have actually reached the surface of the
  2903         -** disk and can be restored in the event of a hot-journal rollback.
  2904         -**
  2905         -** If the Pager.needSync flag is not set, then this function is a
  2906         -** no-op. Otherwise, the actions required depend on the journal-mode
  2907         -** and the device characteristics of the the file-system, as follows:
  2908         -**
  2909         -**   * If the journal file is an in-memory journal file, no action need
  2910         -**     be taken.
  2911         -**
  2912         -**   * Otherwise, if the device does not support the SAFE_APPEND property,
  2913         -**     then the nRec field of the most recently written journal header
  2914         -**     is updated to contain the number of journal records that have
  2915         -**     been written following it. If the pager is operating in full-sync
  2916         -**     mode, then the journal file is synced before this field is updated.
  2917         -**
  2918         -**   * If the device does not support the SEQUENTIAL property, then 
  2919         -**     journal file is synced.
  2920         -**
  2921         -** Or, in pseudo-code:
  2922         -**
  2923         -**   if( NOT <in-memory journal> ){
  2924         -**     if( NOT SAFE_APPEND ){
  2925         -**       if( <full-sync mode> ) xSync(<journal file>);
  2926         -**       <update nRec field>
  2927         -**     } 
  2928         -**     if( NOT SEQUENTIAL ) xSync(<journal file>);
  2929         -**   }
  2930         -**
  2931         -** The Pager.needSync flag is never be set for temporary files, or any
  2932         -** file operating in no-sync mode (Pager.noSync set to non-zero).
  2933         -**
  2934         -** If successful, this routine clears the PGHDR_NEED_SYNC flag of every 
  2935         -** page currently held in memory before returning SQLITE_OK. If an IO
  2936         -** error is encountered, then the IO error code is returned to the caller.
         2495  +** disk.  It is not safe to modify the original database file until after
         2496  +** the journal has been synced.  If the original database is modified before
         2497  +** the journal is synced and a power failure occurs, the unsynced journal
         2498  +** data would be lost and we would be unable to completely rollback the
         2499  +** database changes.  Database corruption would occur.
         2500  +** 
         2501  +** This routine also updates the nRec field in the header of the journal.
         2502  +** (See comments on the pager_playback() routine for additional information.)
         2503  +** If the sync mode is FULL, two syncs will occur.  First the whole journal
         2504  +** is synced, then the nRec field is updated, then a second sync occurs.
         2505  +**
         2506  +** For temporary databases, we do not care if we are able to rollback
         2507  +** after a power failure, so no sync occurs.
         2508  +**
         2509  +** If the IOCAP_SEQUENTIAL flag is set for the persistent media on which
         2510  +** the database is stored, then OsSync() is never called on the journal
         2511  +** file. In this case all that is required is to update the nRec field in
         2512  +** the journal header.
         2513  +**
         2514  +** This routine clears the needSync field of every page current held in
         2515  +** memory.
  2937   2516   */
  2938   2517   static int syncJournal(Pager *pPager){
         2518  +  int rc = SQLITE_OK;
         2519  +
         2520  +  /* Sync the journal before modifying the main database
         2521  +  ** (assuming there is a journal and it needs to be synced.)
         2522  +  */
  2939   2523     if( pPager->needSync ){
  2940   2524       assert( !pPager->tempFile );
  2941   2525       if( pPager->journalMode!=PAGER_JOURNALMODE_MEMORY ){
  2942         -      int rc;                              /* Return code */
  2943         -      const int iDc = sqlite3OsDeviceCharacteristics(pPager->fd);
  2944         -      assert( isOpen(pPager->jfd) );
         2526  +      int iDc = sqlite3OsDeviceCharacteristics(pPager->fd);
         2527  +      assert( pPager->journalOpen );
  2945   2528   
  2946   2529         if( 0==(iDc&SQLITE_IOCAP_SAFE_APPEND) ){
  2947         -        /* Variable iNRecOffset is set to the offset in the journal file
  2948         -        ** of the nRec field of the most recently written journal header.
  2949         -        ** This field will be updated following the xSync() operation
  2950         -        ** on the journal file. */
  2951         -        i64 iNRecOffset = pPager->journalHdr + sizeof(aJournalMagic);
         2530  +        i64 jrnlOff = journalHdrOffset(pPager);
         2531  +        u8 zMagic[8];
  2952   2532   
  2953   2533           /* This block deals with an obscure problem. If the last connection
  2954   2534           ** that wrote to this database was operating in persistent-journal
  2955   2535           ** mode, then the journal file may at this point actually be larger
  2956   2536           ** than Pager.journalOff bytes. If the next thing in the journal
  2957   2537           ** file happens to be a journal-header (written as part of the
  2958   2538           ** previous connections transaction), and a crash or power-failure 
................................................................................
  2962   2542           ** hot-journal rollback following recovery. It may roll back all
  2963   2543           ** of this connections data, then proceed to rolling back the old,
  2964   2544           ** out-of-date data that follows it. Database corruption.
  2965   2545           **
  2966   2546           ** To work around this, if the journal file does appear to contain
  2967   2547           ** a valid header following Pager.journalOff, then write a 0x00
  2968   2548           ** byte to the start of it to prevent it from being recognized.
  2969         -        **
  2970         -        ** Variable iNextHdrOffset is set to the offset at which this
  2971         -        ** problematic header will occur, if it exists. aMagic is used 
  2972         -        ** as a temporary buffer to inspect the first couple of bytes of
  2973         -        ** the potential journal header.
  2974   2549           */
  2975         -        i64 iNextHdrOffset = journalHdrOffset(pPager);
  2976         -        u8 aMagic[8];
  2977         -        rc = sqlite3OsRead(pPager->jfd, aMagic, 8, iNextHdrOffset);
  2978         -        if( rc==SQLITE_OK && 0==memcmp(aMagic, aJournalMagic, 8) ){
         2550  +        rc = sqlite3OsRead(pPager->jfd, zMagic, 8, jrnlOff);
         2551  +        if( rc==SQLITE_OK && 0==memcmp(zMagic, aJournalMagic, 8) ){
  2979   2552             static const u8 zerobyte = 0;
  2980         -          rc = sqlite3OsWrite(pPager->jfd, &zerobyte, 1, iNextHdrOffset);
         2553  +          rc = sqlite3OsWrite(pPager->jfd, &zerobyte, 1, jrnlOff);
  2981   2554           }
  2982   2555           if( rc!=SQLITE_OK && rc!=SQLITE_IOERR_SHORT_READ ){
  2983   2556             return rc;
  2984   2557           }
  2985   2558   
  2986   2559           /* Write the nRec value into the journal file header. If in
  2987   2560           ** full-synchronous mode, sync the journal first. This ensures that
................................................................................
  2994   2567           ** is populated with 0xFFFFFFFF when the journal header is written
  2995   2568           ** and never needs to be updated.
  2996   2569           */
  2997   2570           if( pPager->fullSync && 0==(iDc&SQLITE_IOCAP_SEQUENTIAL) ){
  2998   2571             PAGERTRACE(("SYNC journal of %d\n", PAGERID(pPager)));
  2999   2572             IOTRACE(("JSYNC %p\n", pPager))
  3000   2573             rc = sqlite3OsSync(pPager->jfd, pPager->sync_flags);
  3001         -          if( rc!=SQLITE_OK ) return rc;
         2574  +          if( rc!=0 ) return rc;
  3002   2575           }
  3003         -        IOTRACE(("JHDR %p %lld %d\n", pPager, iNRecOffset, 4));
  3004         -        rc = write32bits(pPager->jfd, iNRecOffset, pPager->nRec);
  3005         -        if( rc!=SQLITE_OK ) return rc;
         2576  +
         2577  +        jrnlOff = pPager->journalHdr + sizeof(aJournalMagic);
         2578  +        IOTRACE(("JHDR %p %lld %d\n", pPager, jrnlOff, 4));
         2579  +        rc = write32bits(pPager->jfd, jrnlOff, pPager->nRec);
         2580  +        if( rc ) return rc;
  3006   2581         }
  3007   2582         if( 0==(iDc&SQLITE_IOCAP_SEQUENTIAL) ){
  3008   2583           PAGERTRACE(("SYNC journal of %d\n", PAGERID(pPager)));
  3009   2584           IOTRACE(("JSYNC %p\n", pPager))
  3010   2585           rc = sqlite3OsSync(pPager->jfd, pPager->sync_flags| 
  3011   2586             (pPager->sync_flags==SQLITE_SYNC_FULL?SQLITE_SYNC_DATAONLY:0)
  3012   2587           );
  3013         -        if( rc!=SQLITE_OK ) return rc;
         2588  +        if( rc!=0 ) return rc;
  3014   2589         }
         2590  +      pPager->journalStarted = 1;
  3015   2591       }
  3016         -
  3017         -    /* The journal file was just successfully synced. Set Pager.needSync 
  3018         -    ** to zero and clear the PGHDR_NEED_SYNC flag on all pagess.
  3019         -    */
  3020   2592       pPager->needSync = 0;
  3021         -    pPager->journalStarted = 1;
         2593  +
         2594  +    /* Erase the needSync flag from every page.
         2595  +    */
  3022   2596       sqlite3PcacheClearSyncFlags(pPager->pPCache);
  3023   2597     }
  3024   2598   
  3025         -  return SQLITE_OK;
         2599  +  return rc;
  3026   2600   }
  3027   2601   
  3028   2602   /*
  3029         -** The argument is the first in a linked list of dirty pages connected
  3030         -** by the PgHdr.pDirty pointer. This function writes each one of the
  3031         -** in-memory pages in the list to the database file. The argument may
  3032         -** be NULL, representing an empty list. In this case this function is
  3033         -** a no-op.
  3034         -**
  3035         -** The pager must hold at least a RESERVED lock when this function
  3036         -** is called. Before writing anything to the database file, this lock
  3037         -** is upgraded to an EXCLUSIVE lock. If the lock cannot be obtained,
  3038         -** SQLITE_BUSY is returned and no data is written to the database file.
  3039         -** 
  3040         -** If the pager is a temp-file pager and the actual file-system file
  3041         -** is not yet open, it is created and opened before any data is 
  3042         -** written out.
  3043         -**
  3044         -** Once the lock has been upgraded and, if necessary, the file opened,
  3045         -** the pages are written out to the database file in list order. Writing
  3046         -** a page is skipped if it meets either of the following criteria:
  3047         -**
  3048         -**   * The page number is greater than Pager.dbSize, or
  3049         -**   * The PGHDR_DONT_WRITE flag is set on the page.
  3050         -**
  3051         -** If writing out a page causes the database file to grow, Pager.dbFileSize
  3052         -** is updated accordingly. If page 1 is written out, then the value cached
  3053         -** in Pager.dbFileVers[] is updated to match the new value stored in
  3054         -** the database file.
  3055         -**
  3056         -** If everything is successful, SQLITE_OK is returned. If an IO error 
  3057         -** occurs, an IO error code is returned. Or, if the EXCLUSIVE lock cannot
  3058         -** be obtained, SQLITE_BUSY is returned.
         2603  +** Given a list of pages (connected by the PgHdr.pDirty pointer) write
         2604  +** every one of those pages out to the database file. No calls are made
         2605  +** to the page-cache to mark the pages as clean. It is the responsibility
         2606  +** of the caller to use PcacheCleanAll() or PcacheMakeClean() to mark
         2607  +** the pages as clean.
  3059   2608   */
  3060   2609   static int pager_write_pagelist(PgHdr *pList){
  3061         -  Pager *pPager;                       /* Pager object */
  3062         -  int rc;                              /* Return code */
         2610  +  Pager *pPager;
         2611  +  int rc;
  3063   2612   
  3064   2613     if( pList==0 ) return SQLITE_OK;
  3065   2614     pPager = pList->pPager;
  3066   2615   
  3067   2616     /* At this point there may be either a RESERVED or EXCLUSIVE lock on the
  3068   2617     ** database file. If there is already an EXCLUSIVE lock, the following
  3069         -  ** call is a no-op.
         2618  +  ** calls to sqlite3OsLock() are no-ops.
  3070   2619     **
  3071   2620     ** Moving the lock from RESERVED to EXCLUSIVE actually involves going
  3072   2621     ** through an intermediate state PENDING.   A PENDING lock prevents new
  3073   2622     ** readers from attaching to the database but is unsufficient for us to
  3074   2623     ** write.  The idea of a PENDING lock is to prevent new readers from
  3075   2624     ** coming in while we wait for existing readers to clear.
  3076   2625     **
  3077   2626     ** While the pager is in the RESERVED state, the original database file
  3078   2627     ** is unchanged and we can rollback without having to playback the
  3079   2628     ** journal into the original database file.  Once we transition to
  3080   2629     ** EXCLUSIVE, it means the database file has been changed and any rollback
  3081   2630     ** will require a journal playback.
  3082   2631     */
  3083         -  assert( pPager->state>=PAGER_RESERVED );
  3084   2632     rc = pager_wait_on_lock(pPager, EXCLUSIVE_LOCK);
  3085         -
  3086         -  /* If the file is a temp-file has not yet been opened, open it now. It
  3087         -  ** is not possible for rc to be other than SQLITE_OK if this branch
  3088         -  ** is taken, as pager_wait_on_lock() is a no-op for temp-files.
  3089         -  */
  3090         -  if( !isOpen(pPager->fd) ){
  3091         -    assert( pPager->tempFile && rc==SQLITE_OK );
  3092         -    rc = pagerOpentemp(pPager, pPager->fd, pPager->vfsFlags);
         2633  +  if( rc!=SQLITE_OK ){
         2634  +    return rc;
  3093   2635     }
  3094   2636   
  3095         -  while( rc==SQLITE_OK && pList ){
  3096         -    Pgno pgno = pList->pgno;
         2637  +  while( pList ){
         2638  +
         2639  +    /* If the file has not yet been opened, open it now. */
         2640  +    if( !pPager->fd->pMethods ){
         2641  +      assert(pPager->tempFile);
         2642  +      rc = sqlite3PagerOpentemp(pPager, pPager->fd, pPager->vfsFlags);
         2643  +      if( rc ) return rc;
         2644  +    }
  3097   2645   
  3098   2646       /* If there are dirty pages in the page cache with page numbers greater
  3099   2647       ** than Pager.dbSize, this means sqlite3PagerTruncateImage() was called to
  3100   2648       ** make the file smaller (presumably by auto-vacuum code). Do not write
  3101   2649       ** any such pages to the file.
  3102         -    **
  3103         -    ** Also, do not write out any page that has the PGHDR_DONT_WRITE flag
  3104         -    ** set (set by sqlite3PagerDontWrite()).
  3105   2650       */
  3106         -    if( pgno<=pPager->dbSize && 0==(pList->flags&PGHDR_DONT_WRITE) ){
  3107         -      i64 offset = (pgno-1)*(i64)pPager->pageSize;         /* Offset to write */
  3108         -      char *pData = CODEC2(pPager, pList->pData, pgno, 6); /* Data to write */
         2651  +    if( pList->pgno<=pPager->dbSize && 0==(pList->flags&PGHDR_DONT_WRITE) ){
         2652  +      i64 offset = (pList->pgno-1)*(i64)pPager->pageSize;
         2653  +      char *pData = CODEC2(pPager, pList->pData, pList->pgno, 6);
  3109   2654   
  3110         -      /* Write out the page data. */
         2655  +      PAGERTRACE(("STORE %d page %d hash(%08x)\n",
         2656  +                   PAGERID(pPager), pList->pgno, pager_pagehash(pList)));
         2657  +      IOTRACE(("PGOUT %p %d\n", pPager, pList->pgno));
  3111   2658         rc = sqlite3OsWrite(pPager->fd, pData, pPager->pageSize, offset);
  3112         -
  3113         -      /* If page 1 was just written, update Pager.dbFileVers to match
  3114         -      ** the value now stored in the database file. If writing this 
  3115         -      ** page caused the database file to grow, update dbFileSize. 
  3116         -      */
  3117         -      if( pgno==1 ){
         2659  +      PAGER_INCR(sqlite3_pager_writedb_count);
         2660  +      PAGER_INCR(pPager->nWrite);
         2661  +      if( pList->pgno==1 ){
  3118   2662           memcpy(&pPager->dbFileVers, &pData[24], sizeof(pPager->dbFileVers));
  3119   2663         }
  3120         -      if( pgno>pPager->dbFileSize ){
  3121         -        pPager->dbFileSize = pgno;
         2664  +      if( pList->pgno>pPager->dbFileSize ){
         2665  +        pPager->dbFileSize = pList->pgno;
  3122   2666         }
  3123         -
  3124         -      PAGERTRACE(("STORE %d page %d hash(%08x)\n",
  3125         -                   PAGERID(pPager), pgno, pager_pagehash(pList)));
  3126         -      IOTRACE(("PGOUT %p %d\n", pPager, pgno));
  3127         -      PAGER_INCR(sqlite3_pager_writedb_count);
  3128         -      PAGER_INCR(pPager->nWrite);
  3129         -    }else{
  3130         -      PAGERTRACE(("NOSTORE %d page %d\n", PAGERID(pPager), pgno));
  3131   2667       }
         2668  +#ifndef NDEBUG
         2669  +    else{
         2670  +      PAGERTRACE(("NOSTORE %d page %d\n", PAGERID(pPager), pList->pgno));
         2671  +    }
         2672  +#endif
         2673  +    if( rc ) return rc;
  3132   2674   #ifdef SQLITE_CHECK_PAGES
  3133   2675       pList->pageHash = pager_pagehash(pList);
  3134   2676   #endif
  3135   2677       pList = pList->pDirty;
  3136   2678     }
  3137   2679   
  3138         -  return rc;
         2680  +  return SQLITE_OK;
  3139   2681   }
  3140   2682   
  3141   2683   /*
  3142         -** Append a record of the current state of page pPg to the sub-journal. 
  3143         -** It is the callers responsibility to use subjRequiresPage() to check 
  3144         -** that it is really required before calling this function.
  3145         -**
  3146         -** If successful, set the bit corresponding to pPg->pgno in the bitvecs
  3147         -** for all open savepoints before returning.
  3148         -**
  3149         -** This function returns SQLITE_OK if everything is successful, an IO
  3150         -** error code if the attempt to write to the sub-journal fails, or 
  3151         -** SQLITE_NOMEM if a malloc fails while setting a bit in a savepoint
  3152         -** bitvec.
         2684  +** Add the page to the sub-journal. It is the callers responsibility to
         2685  +** use subjRequiresPage() to check that it is really required before 
         2686  +** calling this function.
  3153   2687   */
  3154   2688   static int subjournalPage(PgHdr *pPg){
  3155   2689     int rc;
  3156   2690     void *pData = pPg->pData;
  3157   2691     Pager *pPager = pPg->pPager;
  3158         -  i64 offset = pPager->nSubRec*(4+pPager->pageSize);
         2692  +  i64 offset = pPager->stmtNRec*(4+pPager->pageSize);
  3159   2693     char *pData2 = CODEC2(pPager, pData, pPg->pgno, 7);
  3160   2694   
  3161   2695     PAGERTRACE(("STMT-JOURNAL %d page %d\n", PAGERID(pPager), pPg->pgno));
  3162   2696   
  3163   2697     assert( pageInJournal(pPg) || pPg->pgno>pPager->dbOrigSize );
  3164   2698     rc = write32bits(pPager->sjfd, offset, pPg->pgno);
  3165   2699     if( rc==SQLITE_OK ){
  3166   2700       rc = sqlite3OsWrite(pPager->sjfd, pData2, pPager->pageSize, offset+4);
  3167   2701     }
  3168   2702     if( rc==SQLITE_OK ){
  3169         -    pPager->nSubRec++;
         2703  +    pPager->stmtNRec++;
  3170   2704       assert( pPager->nSavepoint>0 );
  3171   2705       rc = addToSavepointBitvecs(pPager, pPg->pgno);
  3172         -    testcase( rc!=SQLITE_OK );
  3173   2706     }
  3174   2707     return rc;
  3175   2708   }
  3176   2709   
  3177   2710   
  3178   2711   /*
  3179   2712   ** This function is called by the pcache layer when it has reached some
  3180         -** soft memory limit. The first argument is a pointer to a Pager object
  3181         -** (cast as a void*). The pager is always 'purgeable' (not an in-memory
  3182         -** database). The second argument is a reference to a page that is 
  3183         -** currently dirty but has no outstanding references. The page
  3184         -** is always associated with the Pager object passed as the first 
  3185         -** argument.
  3186         -**
  3187         -** The job of this function is to make pPg clean by writing its contents
  3188         -** out to the database file, if possible. This may involve syncing the
  3189         -** journal file. 
  3190         -**
  3191         -** If successful, sqlite3PcacheMakeClean() is called on the page and
  3192         -** SQLITE_OK returned. If an IO error occurs while trying to make the
  3193         -** page clean, the IO error code is returned. If the page cannot be
  3194         -** made clean for some other reason, but no error occurs, then SQLITE_OK
  3195         -** is returned by sqlite3PcacheMakeClean() is not called.
         2713  +** soft memory limit. The argument is a pointer to a purgeable Pager 
         2714  +** object. This function attempts to make a single dirty page that has no
         2715  +** outstanding references (if one exists) clean so that it can be recycled 
         2716  +** by the pcache layer.
  3196   2717   */
  3197   2718   static int pagerStress(void *p, PgHdr *pPg){
  3198   2719     Pager *pPager = (Pager *)p;
  3199   2720     int rc = SQLITE_OK;
  3200   2721   
  3201         -  assert( pPg->pPager==pPager );
  3202         -  assert( pPg->flags&PGHDR_DIRTY );
  3203         -
  3204         -  /* The doNotSync flag is set by the sqlite3PagerWrite() function while it
  3205         -  ** is journalling a set of two or more database pages that are stored
  3206         -  ** on the same disk sector. Syncing the journal is not allowed while
  3207         -  ** this is happening as it is important that all members of such a
  3208         -  ** set of pages are synced to disk together. So, if the page this function
  3209         -  ** is trying to make clean will require a journal sync and the doNotSync
  3210         -  ** flag is set, return without doing anything. The pcache layer will
  3211         -  ** just have to go ahead and allocate a new page buffer instead of
  3212         -  ** reusing pPg.
  3213         -  **
  3214         -  ** Similarly, if the pager has already entered the error state, do not
  3215         -  ** try to write the contents of pPg to disk.
  3216         -  */
  3217         -  if( pPager->errCode || (pPager->doNotSync && pPg->flags&PGHDR_NEED_SYNC) ){
         2722  +  if( pPager->doNotSync ){
  3218   2723       return SQLITE_OK;
  3219   2724     }
  3220   2725   
  3221         -  /* Sync the journal file if required. */
  3222         -  if( pPg->flags&PGHDR_NEED_SYNC ){
  3223         -    rc = syncJournal(pPager);
  3224         -    if( rc==SQLITE_OK && pPager->fullSync && 
  3225         -      !(pPager->journalMode==PAGER_JOURNALMODE_MEMORY) &&
  3226         -      !(sqlite3OsDeviceCharacteristics(pPager->fd)&SQLITE_IOCAP_SAFE_APPEND)
  3227         -    ){
  3228         -      pPager->nRec = 0;
  3229         -      rc = writeJournalHdr(pPager);
  3230         -    }
  3231         -  }
  3232         -
  3233         -  /* If the page number of this page is larger than the current size of
  3234         -  ** the database image, it may need to be written to the sub-journal.
  3235         -  ** This is because the call to pager_write_pagelist() below will not
  3236         -  ** actually write data to the file in this case.
  3237         -  **
  3238         -  ** Consider the following sequence of events:
  3239         -  **
  3240         -  **   BEGIN;
  3241         -  **     <journal page X>
  3242         -  **     <modify page X>
  3243         -  **     SAVEPOINT sp;
  3244         -  **       <shrink database file to Y pages>
  3245         -  **       pagerStress(page X)
  3246         -  **     ROLLBACK TO sp;
  3247         -  **
  3248         -  ** If (X>Y), then when pagerStress is called page X will not be written
  3249         -  ** out to the database file, but will be dropped from the cache. Then,
  3250         -  ** following the "ROLLBACK TO sp" statement, reading page X will read
  3251         -  ** data from the database file. This will be the copy of page X as it
  3252         -  ** was when the transaction started, not as it was when "SAVEPOINT sp"
  3253         -  ** was executed.
  3254         -  **
  3255         -  ** The solution is to write the current data for page X into the 
  3256         -  ** sub-journal file now (if it is not already there), so that it will
  3257         -  ** be restored to its current value when the "ROLLBACK TO sp" is 
  3258         -  ** executed.
  3259         -  */
  3260         -  if( rc==SQLITE_OK && pPg->pgno>pPager->dbSize && subjRequiresPage(pPg) ){
  3261         -    rc = subjournalPage(pPg);
  3262         -  }
  3263         -
  3264         -  /* Write the contents of the page out to the database file. */
  3265         -  if( rc==SQLITE_OK ){
  3266         -    pPg->pDirty = 0;
  3267         -    rc = pager_write_pagelist(pPg);
  3268         -  }
  3269         -
  3270         -  /* Mark the page as clean. */
         2726  +  assert( pPg->flags&PGHDR_DIRTY );
         2727  +  if( pPager->errCode==SQLITE_OK ){
         2728  +    if( pPg->flags&PGHDR_NEED_SYNC ){
         2729  +      rc = syncJournal(pPager);
         2730  +      if( rc==SQLITE_OK && pPager->fullSync && 
         2731  +        !(pPager->journalMode==PAGER_JOURNALMODE_MEMORY) &&
         2732  +        !(sqlite3OsDeviceCharacteristics(pPager->fd)&SQLITE_IOCAP_SAFE_APPEND)
         2733  +      ){
         2734  +        pPager->nRec = 0;
         2735  +        rc = writeJournalHdr(pPager);
         2736  +      }
         2737  +    }
         2738  +    if( rc==SQLITE_OK ){
         2739  +      pPg->pDirty = 0;
         2740  +      if( pPg->pgno>pPager->dbSize && subjRequiresPage(pPg) ){
         2741  +        rc = subjournalPage(pPg);
         2742  +      }
         2743  +      if( rc==SQLITE_OK ){
         2744  +        rc = pager_write_pagelist(pPg);
         2745  +      }
         2746  +    }
         2747  +    if( rc!=SQLITE_OK ){
         2748  +      pager_error(pPager, rc);
         2749  +    }
         2750  +  }
         2751  +
  3271   2752     if( rc==SQLITE_OK ){
  3272   2753       PAGERTRACE(("STRESS %d page %d\n", PAGERID(pPager), pPg->pgno));
  3273   2754       sqlite3PcacheMakeClean(pPg);
  3274   2755     }
  3275         -
  3276         -  return pager_error(pPager, rc);
         2756  +  return rc;
  3277   2757   }
  3278   2758   
  3279   2759   
  3280   2760   /*
  3281         -** This function is called after transitioning from PAGER_UNLOCK to
  3282         -** PAGER_SHARED state. It tests if there is a hot journal present in
  3283         -** the file-system for the given pager. A hot journal is one that 
  3284         -** needs to be played back. According to this function, a hot-journal
  3285         -** file exists if the following three criteria are met:
  3286         -**
  3287         -**   * The journal file exists in the file system, and
  3288         -**   * No process holds a RESERVED or greater lock on the database file, and
  3289         -**   * The database file itself is greater than 0 bytes in size.
         2761  +** Return 1 if there is a hot journal on the given pager.
         2762  +** A hot journal is one that needs to be played back.
  3290   2763   **
  3291   2764   ** If the current size of the database file is 0 but a journal file
  3292   2765   ** exists, that is probably an old journal left over from a prior
  3293         -** database with the same name. In this case the journal file is
  3294         -** just deleted using OsDelete, *pExists is set to 0 and SQLITE_OK
  3295         -** is returned.
         2766  +** database with the same name.  Just delete the journal.
         2767  +**
         2768  +** Return negative if unable to determine the status of the journal.
  3296   2769   **
  3297   2770   ** This routine does not open the journal file to examine its
  3298   2771   ** content.  Hence, the journal might contain the name of a master
  3299   2772   ** journal file that has been deleted, and hence not be hot.  Or
  3300   2773   ** the header of the journal might be zeroed out.  This routine
  3301   2774   ** does not discover these cases of a non-hot journal - if the
  3302   2775   ** journal file exists and is not empty this routine assumes it
  3303   2776   ** is hot.  The pager_playback() routine will discover that the
  3304   2777   ** journal file is not really hot and will no-op.
  3305         -**
  3306         -** If a hot-journal file is found to exist, *pExists is set to 1 and 
  3307         -** SQLITE_OK returned. If no hot-journal file is present, *pExists is
  3308         -** set to 0 and SQLITE_OK returned. If an IO error occurs while trying
  3309         -** to determine whether or not a hot-journal file exists, the IO error
  3310         -** code is returned and the value of *pExists is undefined.
  3311   2778   */
  3312   2779   static int hasHotJournal(Pager *pPager, int *pExists){
  3313         -  sqlite3_vfs * const pVfs = pPager->pVfs;
  3314         -  int rc;                       /* Return code */
  3315         -  int exists = 0;               /* True if a journal file is present */
  3316         -  int locked = 0;               /* True if some process holds a RESERVED lock */
  3317         -
         2780  +  sqlite3_vfs *pVfs = pPager->pVfs;
         2781  +  int rc = SQLITE_OK;
         2782  +  int exists = 0;
         2783  +  int locked = 0;
  3318   2784     assert( pPager!=0 );
  3319   2785     assert( pPager->useJournal );
  3320         -  assert( isOpen(pPager->fd) );
  3321         -
         2786  +  assert( pPager->fd->pMethods );
  3322   2787     *pExists = 0;
  3323   2788     rc = sqlite3OsAccess(pVfs, pPager->zJournal, SQLITE_ACCESS_EXISTS, &exists);
  3324   2789     if( rc==SQLITE_OK && exists ){
  3325   2790       rc = sqlite3OsCheckReservedLock(pPager->fd, &locked);
  3326         -    if( rc==SQLITE_OK && !locked ){
  3327         -      int nPage;
  3328         -      rc = sqlite3PagerPagecount(pPager, &nPage);
  3329         -      if( rc==SQLITE_OK ){
  3330         -       if( nPage==0 ){
  3331         -          sqlite3OsDelete(pVfs, pPager->zJournal, 0);
  3332         -        }else{
  3333         -          *pExists = 1;
  3334         -        }
  3335         -      }
  3336         -    }
  3337         -  }
  3338         -  return rc;
  3339         -}
  3340         -
  3341         -/*
  3342         -** Read the content for page pPg out of the database file and into 
  3343         -** pPg->pData. A shared lock or greater must be held on the database
  3344         -** file before this function is called.
  3345         -**
  3346         -** If page 1 is read, then the value of Pager.dbFileVers[] is set to
  3347         -** the value read from the database file.
  3348         -**
  3349         -** If an IO error occurs, then the IO error is returned to the caller.
  3350         -** Otherwise, SQLITE_OK is returned.
  3351         -*/
  3352         -static int readDbPage(PgHdr *pPg){
  3353         -  Pager *pPager = pPg->pPager; /* Pager object associated with page pPg */
  3354         -  Pgno pgno = pPg->pgno;       /* Page number to read */
  3355         -  int rc;                      /* Return code */
  3356         -  i64 iOffset;                 /* Byte offset of file to read from */
  3357         -
  3358         -  assert( pPager->state>=PAGER_SHARED && !MEMDB );
  3359         -
  3360         -  if( !isOpen(pPager->fd) ){
  3361         -    assert( pPager->tempFile );
         2791  +  }
         2792  +  if( rc==SQLITE_OK && exists && !locked ){
         2793  +    int nPage;
         2794  +    rc = sqlite3PagerPagecount(pPager, &nPage);
         2795  +    if( rc==SQLITE_OK ){
         2796  +     if( nPage==0 ){
         2797  +        sqlite3OsDelete(pVfs, pPager->zJournal, 0);
         2798  +      }else{
         2799  +        *pExists = 1;
         2800  +      }
         2801  +    }
         2802  +  }
         2803  +  return rc;
         2804  +}
         2805  +
         2806  +/*
         2807  +** Read the content of page pPg out of the database file.
         2808  +*/
         2809  +static int readDbPage(Pager *pPager, PgHdr *pPg, Pgno pgno){
         2810  +  int rc;
         2811  +  i64 offset;
         2812  +  assert( MEMDB==0 );
         2813  +  assert(pPager->fd->pMethods||pPager->tempFile);
         2814  +  if( !pPager->fd->pMethods ){
  3362   2815       return SQLITE_IOERR_SHORT_READ;
  3363   2816     }
  3364         -  iOffset = (pgno-1)*(i64)pPager->pageSize;
  3365         -  rc = sqlite3OsRead(pPager->fd, pPg->pData, pPager->pageSize, iOffset);
  3366         -  if( pgno==1 ){
  3367         -    u8 *dbFileVers = &((u8*)pPg->pData)[24];
  3368         -    memcpy(&pPager->dbFileVers, dbFileVers, sizeof(pPager->dbFileVers));
  3369         -  }
  3370         -  CODEC1(pPager, pPg->pData, pgno, 3);
  3371         -
         2817  +  offset = (pgno-1)*(i64)pPager->pageSize;
         2818  +  rc = sqlite3OsRead(pPager->fd, pPg->pData, pPager->pageSize, offset);
  3372   2819     PAGER_INCR(sqlite3_pager_readdb_count);
  3373   2820     PAGER_INCR(pPager->nRead);
  3374   2821     IOTRACE(("PGIN %p %d\n", pPager, pgno));
         2822  +  if( pgno==1 ){
         2823  +    memcpy(&pPager->dbFileVers, &((u8*)pPg->pData)[24],
         2824  +                                              sizeof(pPager->dbFileVers));
         2825  +  }
         2826  +  CODEC1(pPager, pPg->pData, pPg->pgno, 3);
  3375   2827     PAGERTRACE(("FETCH %d page %d hash(%08x)\n",
  3376         -               PAGERID(pPager), pgno, pager_pagehash(pPg)));
  3377         -
         2828  +               PAGERID(pPager), pPg->pgno, pager_pagehash(pPg)));
  3378   2829     return rc;
  3379   2830   }
         2831  +
  3380   2832   
  3381   2833   /*
  3382   2834   ** This function is called to obtain the shared lock required before
  3383   2835   ** data may be read from the pager cache. If the shared lock has already
  3384   2836   ** been obtained, this function is a no-op.
  3385   2837   **
  3386   2838   ** Immediately after obtaining the shared lock (if required), this function
  3387   2839   ** checks for a hot-journal file. If one is found, an emergency rollback
  3388   2840   ** is performed immediately.
  3389   2841   */
  3390   2842   static int pagerSharedLock(Pager *pPager){
  3391         -  int rc = SQLITE_OK;                /* Return code */
  3392         -  int isErrorReset = 0;              /* True if recovering from error state */
         2843  +  int rc = SQLITE_OK;
         2844  +  int isErrorReset = 0;
  3393   2845   
  3394   2846     /* If this database is opened for exclusive access, has no outstanding 
  3395         -  ** page references and is in an error-state, this is a chance to clear
         2847  +  ** page references and is in an error-state, now is the chance to clear
  3396   2848     ** the error. Discard the contents of the pager-cache and treat any
  3397   2849     ** open journal file as a hot-journal.
  3398   2850     */
  3399   2851     if( !MEMDB && pPager->exclusiveMode 
  3400   2852      && sqlite3PcacheRefCount(pPager->pPCache)==0 && pPager->errCode 
  3401   2853     ){
  3402         -    if( isOpen(pPager->jfd) ){
         2854  +    if( pPager->journalOpen ){
  3403   2855         isErrorReset = 1;
  3404   2856       }
  3405   2857       pPager->errCode = SQLITE_OK;
  3406   2858       pager_reset(pPager);
  3407   2859     }
  3408   2860   
  3409   2861     /* If the pager is still in an error state, do not proceed. The error 
................................................................................
  3411   2863     ** references are dropped and the cache can be discarded.
  3412   2864     */
  3413   2865     if( pPager->errCode && pPager->errCode!=SQLITE_FULL ){
  3414   2866       return pPager->errCode;
  3415   2867     }
  3416   2868   
  3417   2869     if( pPager->state==PAGER_UNLOCK || isErrorReset ){
  3418         -    sqlite3_vfs * const pVfs = pPager->pVfs;
         2870  +    sqlite3_vfs *pVfs = pPager->pVfs;
  3419   2871       int isHotJournal = 0;
  3420   2872       assert( !MEMDB );
  3421   2873       assert( sqlite3PcacheRefCount(pPager->pPCache)==0 );
  3422   2874       if( !pPager->noReadlock ){
  3423   2875         rc = pager_wait_on_lock(pPager, SHARED_LOCK);
  3424   2876         if( rc!=SQLITE_OK ){
  3425   2877           assert( pPager->state==PAGER_UNLOCK );
................................................................................
  3440   2892         }
  3441   2893       }
  3442   2894       if( isErrorReset || isHotJournal ){
  3443   2895         /* Get an EXCLUSIVE lock on the database file. At this point it is
  3444   2896         ** important that a RESERVED lock is not obtained on the way to the
  3445   2897         ** EXCLUSIVE lock. If it were, another process might open the
  3446   2898         ** database file, detect the RESERVED lock, and conclude that the
  3447         -      ** database is safe to read while this process is still rolling the 
  3448         -      ** hot-journal back.
         2899  +      ** database is safe to read while this process is still rolling it 
         2900  +      ** back.
  3449   2901         ** 
  3450         -      ** Because the intermediate RESERVED lock is not requested, any
  3451         -      ** other process attempting to access the database file will get to 
  3452         -      ** this point in the code and fail to obtain its own EXCLUSIVE lock 
  3453         -      ** on the database file.
         2902  +      ** Because the intermediate RESERVED lock is not requested, the
         2903  +      ** second process will get to this point in the code and fail to
         2904  +      ** obtain its own EXCLUSIVE lock on the database file.
  3454   2905         */
  3455   2906         if( pPager->state<EXCLUSIVE_LOCK ){
  3456   2907           rc = sqlite3OsLock(pPager->fd, EXCLUSIVE_LOCK);
  3457   2908           if( rc!=SQLITE_OK ){
  3458   2909             rc = pager_error(pPager, rc);
  3459   2910             goto failed;
  3460   2911           }
................................................................................
  3463   2914    
  3464   2915         /* Open the journal for read/write access. This is because in 
  3465   2916         ** exclusive-access mode the file descriptor will be kept open and
  3466   2917         ** possibly used for a transaction later on. On some systems, the
  3467   2918         ** OsTruncate() call used in exclusive-access mode also requires
  3468   2919         ** a read/write file handle.
  3469   2920         */
  3470         -      if( !isOpen(pPager->jfd) ){
         2921  +      if( !isErrorReset && pPager->journalOpen==0 ){
  3471   2922           int res;
  3472   2923           rc = sqlite3OsAccess(pVfs,pPager->zJournal,SQLITE_ACCESS_EXISTS,&res);
  3473   2924           if( rc==SQLITE_OK ){
  3474   2925             if( res ){
  3475   2926               int fout = 0;
  3476   2927               int f = SQLITE_OPEN_READWRITE|SQLITE_OPEN_MAIN_JOURNAL;
  3477   2928               assert( !pPager->tempFile );
  3478   2929               rc = sqlite3OsOpen(pVfs, pPager->zJournal, pPager->jfd, f, &fout);
  3479         -            assert( rc!=SQLITE_OK || isOpen(pPager->jfd) );
         2930  +            assert( rc!=SQLITE_OK || pPager->jfd->pMethods );
  3480   2931               if( rc==SQLITE_OK && fout&SQLITE_OPEN_READONLY ){
  3481   2932                 rc = SQLITE_CANTOPEN;
  3482   2933                 sqlite3OsClose(pPager->jfd);
  3483   2934               }
  3484   2935             }else{
  3485   2936               /* If the journal does not exist, that means some other process
  3486   2937               ** has already rolled it back */
................................................................................
  3487   2938               rc = SQLITE_BUSY;
  3488   2939             }
  3489   2940           }
  3490   2941         }
  3491   2942         if( rc!=SQLITE_OK ){
  3492   2943           goto failed;
  3493   2944         }
  3494         -
  3495         -      /* TODO: Why are these cleared here? Is it necessary? */
         2945  +      pPager->journalOpen = 1;
  3496   2946         pPager->journalStarted = 0;
  3497   2947         pPager->journalOff = 0;
  3498   2948         pPager->setMaster = 0;
  3499   2949         pPager->journalHdr = 0;
  3500   2950    
  3501   2951         /* Playback and delete the journal.  Drop the database write
  3502   2952         ** lock and reacquire the read lock. Purge the cache before
................................................................................
  3505   2955         */
  3506   2956         sqlite3PcacheClear(pPager->pPCache);
  3507   2957         rc = pager_playback(pPager, 1);
  3508   2958         if( rc!=SQLITE_OK ){
  3509   2959           rc = pager_error(pPager, rc);
  3510   2960           goto failed;
  3511   2961         }
  3512         -      assert( (pPager->state==PAGER_SHARED)
  3513         -           || (pPager->exclusiveMode && pPager->state>PAGER_SHARED)
         2962  +      assert(pPager->state==PAGER_SHARED || 
         2963  +          (pPager->exclusiveMode && pPager->state>PAGER_SHARED)
  3514   2964         );
  3515   2965       }
  3516   2966   
  3517   2967       if( sqlite3PcachePagecount(pPager->pPCache)>0 ){
  3518   2968         /* The shared-lock has just been acquired on the database file
  3519   2969         ** and there are already pages in the cache (from a previous
  3520   2970         ** read or write transaction).  Check to see if the database
................................................................................
  3560   3010    failed:
  3561   3011     if( rc!=SQLITE_OK ){
  3562   3012       /* pager_unlock() is a no-op for exclusive mode and in-memory databases. */
  3563   3013       pager_unlock(pPager);
  3564   3014     }
  3565   3015     return rc;
  3566   3016   }
         3017  +
         3018  +/*
         3019  +** Make sure we have the content for a page.  If the page was
         3020  +** previously acquired with noContent==1, then the content was
         3021  +** just initialized to zeros instead of being read from disk.
         3022  +** But now we need the real data off of disk.  So make sure we
         3023  +** have it.  Read it in if we do not have it already.
         3024  +*/
         3025  +static int pager_get_content(PgHdr *pPg){
         3026  +  if( pPg->flags&PGHDR_NEED_READ ){
         3027  +    int rc = readDbPage(pPg->pPager, pPg, pPg->pgno);
         3028  +    if( rc==SQLITE_OK ){
         3029  +      pPg->flags &= ~PGHDR_NEED_READ;
         3030  +    }else{
         3031  +      return rc;
         3032  +    }
         3033  +  }
         3034  +  return SQLITE_OK;
         3035  +}
  3567   3036   
  3568   3037   /*
  3569   3038   ** If the reference count has reached zero, and the pager is not in the
  3570   3039   ** middle of a write transaction or opened in exclusive mode, unlock it.
  3571   3040   */ 
  3572   3041   static void pagerUnlockIfUnused(Pager *pPager){
  3573   3042     if( (sqlite3PcacheRefCount(pPager->pPCache)==0)
................................................................................
  3610   3079   ** just returns 0.  This routine acquires a read-lock the first time it
  3611   3080   ** has to go to disk, and could also playback an old journal if necessary.
  3612   3081   ** Since Lookup() never goes to disk, it never has to deal with locks
  3613   3082   ** or journal files.
  3614   3083   **
  3615   3084   ** If noContent is false, the page contents are actually read from disk.
  3616   3085   ** If noContent is true, it means that we do not care about the contents
  3617         -** of the page. This occurs in two seperate scenarios:
  3618         -**
  3619         -**   a) When reading a free-list leaf page from the database, and
  3620         -**
  3621         -**   b) When a savepoint is being rolled back and we need to load
  3622         -**      a new page into the cache to populate with the data read
  3623         -**      from the savepoint journal.
  3624         -**
  3625         -** If noContent is true, then the data returned is zeroed instead of
  3626         -** being read from the database. Additionally, the bits corresponding
  3627         -** to pgno in Pager.pInJournal (bitvec of pages already written to the
  3628         -** journal file) and the PagerSavepoint.pInSavepoint bitvecs of any open
  3629         -** savepoints are set. This means if the page is made writable at any
  3630         -** point in the future, using a call to sqlite3PagerWrite(), its contents
  3631         -** will not be journaled. This saves IO.
         3086  +** of the page at this time, so do not do a disk read.  Just fill in the
         3087  +** page content with zeros.  But mark the fact that we have not read the
         3088  +** content by setting the PgHdr.needRead flag.  Later on, if 
         3089  +** sqlite3PagerWrite() is called on this page or if this routine is
         3090  +** called again with noContent==0, that means that the content is needed
         3091  +** and the disk read should occur at that point.
  3632   3092   */
  3633   3093   int sqlite3PagerAcquire(
  3634   3094     Pager *pPager,      /* The pager open on the database file */
  3635   3095     Pgno pgno,          /* Page number to fetch */
  3636   3096     DbPage **ppPage,    /* Write a pointer to the page here */
  3637   3097     int noContent       /* Do not bother reading content from disk if true */
  3638   3098   ){
................................................................................
  3688   3148       if( nMax<(int)pgno || MEMDB || noContent ){
  3689   3149         if( pgno>pPager->mxPgno ){
  3690   3150           sqlite3PagerUnref(pPg);
  3691   3151           return SQLITE_FULL;
  3692   3152         }
  3693   3153         memset(pPg->pData, 0, pPager->pageSize);
  3694   3154         if( noContent ){
  3695         -        /* Failure to set the bits in the InJournal bit-vectors is benign.
  3696         -        ** It merely means that we might do some extra work to journal a 
  3697         -        ** page that does not need to be journaled.  Nevertheless, be sure 
  3698         -        ** to test the case where a malloc error occurs while trying to set 
  3699         -        ** a bit in a bit vector.
  3700         -        */
  3701         -        sqlite3BeginBenignMalloc();
  3702         -        TESTONLY( rc = ) sqlite3BitvecSet(pPager->pInJournal, pPg->pgno);
  3703         -        testcase( rc==SQLITE_NOMEM );
  3704         -        TESTONLY( rc = ) addToSavepointBitvecs(pPager, pPg->pgno);
  3705         -        testcase( rc==SQLITE_NOMEM );
  3706         -        sqlite3EndBenignMalloc();
         3155  +        pPg->flags |= PGHDR_NEED_READ;
  3707   3156         }
  3708   3157         IOTRACE(("ZERO %p %d\n", pPager, pgno));
  3709   3158       }else{
  3710         -      assert( pPg->pPager==pPager && pPg->pgno==pgno );
  3711         -      rc = readDbPage(pPg);
         3159  +      rc = readDbPage(pPager, pPg, pgno);
  3712   3160         if( rc!=SQLITE_OK && rc!=SQLITE_IOERR_SHORT_READ ){
         3161  +        /* sqlite3PagerUnref(pPg); */
  3713   3162           pagerDropPage(pPg);
  3714   3163           return rc;
  3715   3164         }
  3716   3165       }
  3717   3166   #ifdef SQLITE_CHECK_PAGES
  3718   3167       pPg->pageHash = pager_pagehash(pPg);
  3719   3168   #endif
  3720   3169     }else{
  3721   3170       /* The requested page is in the page cache. */
         3171  +    assert(sqlite3PcacheRefCount(pPager->pPCache)>0 || pgno==1);
  3722   3172       PAGER_INCR(pPager->nHit);
         3173  +    if( !noContent ){
         3174  +      rc = pager_get_content(pPg);
         3175  +      if( rc ){
         3176  +        sqlite3PagerUnref(pPg);
         3177  +        return rc;
         3178  +      }
         3179  +    }
  3723   3180     }
  3724   3181   
  3725   3182     *ppPage = pPg;
  3726   3183     return SQLITE_OK;
  3727   3184   }
  3728   3185   
  3729   3186   /*
................................................................................
  3755   3212   ** Release a page.
  3756   3213   **
  3757   3214   ** If the number of references to the page drop to zero, then the
  3758   3215   ** page is added to the LRU list.  When all references to all pages
  3759   3216   ** are released, a rollback occurs and the lock on the database is
  3760   3217   ** removed.
  3761   3218   */
  3762         -void sqlite3PagerUnref(DbPage *pPg){
         3219  +int sqlite3PagerUnref(DbPage *pPg){
  3763   3220     if( pPg ){
  3764   3221       Pager *pPager = pPg->pPager;
  3765   3222       sqlite3PcacheRelease(pPg);
  3766   3223       pagerUnlockIfUnused(pPager);
  3767   3224     }
         3225  +  return SQLITE_OK;
  3768   3226   }
  3769   3227   
  3770   3228   /*
  3771   3229   ** If the main journal file has already been opened, ensure that the
  3772   3230   ** sub-journal file is open too. If the main journal is not open,
  3773   3231   ** this function is a no-op.
  3774   3232   **
  3775   3233   ** SQLITE_OK is returned if everything goes according to plan. An 
  3776   3234   ** SQLITE_IOERR_XXX error code is returned if the call to 
  3777   3235   ** sqlite3OsOpen() fails.
  3778   3236   */
  3779   3237   static int openSubJournal(Pager *pPager){
  3780   3238     int rc = SQLITE_OK;
  3781         -  if( isOpen(pPager->jfd) && !isOpen(pPager->sjfd) ){
         3239  +  if( pPager->journalOpen && !pPager->sjfd->pMethods ){
  3782   3240       if( pPager->journalMode==PAGER_JOURNALMODE_MEMORY ){
  3783   3241         sqlite3MemJournalOpen(pPager->sjfd);
  3784   3242       }else{
  3785         -      rc = pagerOpentemp(pPager, pPager->sjfd, SQLITE_OPEN_SUBJOURNAL);
         3243  +      rc = sqlite3PagerOpentemp(pPager, pPager->sjfd, SQLITE_OPEN_SUBJOURNAL);
  3786   3244       }
  3787   3245     }
  3788   3246     return rc;
  3789   3247   }
  3790   3248   
  3791   3249   /*
  3792   3250   ** Create a journal file for pPager.  There should already be a RESERVED
................................................................................
  3806   3264     sqlite3PagerPagecount(pPager, 0);
  3807   3265     pPager->pInJournal = sqlite3BitvecCreate(pPager->dbSize);
  3808   3266     if( pPager->pInJournal==0 ){
  3809   3267       rc = SQLITE_NOMEM;
  3810   3268       goto failed_to_open_journal;
  3811   3269     }
  3812   3270   
  3813         -  if( !isOpen(pPager->jfd) ){
         3271  +  if( pPager->journalOpen==0 ){
  3814   3272       if( pPager->tempFile ){
  3815   3273         flags |= (SQLITE_OPEN_DELETEONCLOSE|SQLITE_OPEN_TEMP_JOURNAL);
  3816   3274       }else{
  3817   3275         flags |= (SQLITE_OPEN_MAIN_JOURNAL);
  3818   3276       }
  3819   3277       if( pPager->journalMode==PAGER_JOURNALMODE_MEMORY ){
  3820   3278         sqlite3MemJournalOpen(pPager->jfd);
................................................................................
  3824   3282         rc = sqlite3JournalOpen(
  3825   3283             pVfs, pPager->zJournal, pPager->jfd, flags, jrnlBufferSize(pPager)
  3826   3284         );
  3827   3285   #else
  3828   3286         rc = sqlite3OsOpen(pVfs, pPager->zJournal, pPager->jfd, flags, 0);
  3829   3287   #endif
  3830   3288       }
  3831         -    assert( rc!=SQLITE_OK || isOpen(pPager->jfd) );
         3289  +    assert( rc!=SQLITE_OK || pPager->jfd->pMethods );
  3832   3290       pPager->journalOff = 0;
  3833   3291       pPager->setMaster = 0;
  3834   3292       pPager->journalHdr = 0;
  3835   3293       if( rc!=SQLITE_OK ){
  3836   3294         if( rc==SQLITE_NOMEM ){
  3837   3295           sqlite3OsDelete(pVfs, pPager->zJournal, 0);
  3838   3296         }
  3839   3297         goto failed_to_open_journal;
  3840   3298       }
  3841   3299     }
         3300  +  pPager->journalOpen = 1;
  3842   3301     pPager->journalStarted = 0;
  3843   3302     pPager->needSync = 0;
  3844   3303     pPager->nRec = 0;
  3845   3304     if( pPager->errCode ){
  3846   3305       rc = pPager->errCode;
  3847   3306       goto failed_to_open_journal;
  3848   3307     }
................................................................................
  3908   3367         if( exFlag ){
  3909   3368           rc = pager_wait_on_lock(pPager, EXCLUSIVE_LOCK);
  3910   3369         }
  3911   3370       }
  3912   3371       if( rc!=SQLITE_OK ){
  3913   3372         return rc;
  3914   3373       }
         3374  +    pPager->dirtyCache = 0;
  3915   3375       PAGERTRACE(("TRANSACTION %d\n", PAGERID(pPager)));
  3916   3376       if( pPager->useJournal && !pPager->tempFile
  3917   3377              && pPager->journalMode!=PAGER_JOURNALMODE_OFF ){
  3918   3378         rc = pager_open_journal(pPager);
  3919   3379       }
  3920         -  }else if( isOpen(pPager->jfd) && pPager->journalOff==0 ){
         3380  +  }else if( pPager->journalOpen && pPager->journalOff==0 ){
  3921   3381       /* This happens when the pager was in exclusive-access mode the last
  3922   3382       ** time a (read or write) transaction was successfully concluded
  3923   3383       ** by this connection. Instead of deleting the journal file it was 
  3924   3384       ** kept open and either was truncated to 0 bytes or its header was
  3925   3385       ** overwritten with zeros.
  3926   3386       */
  3927   3387       assert( pPager->nRec==0 );
................................................................................
  3932   3392       if( !pPager->pInJournal ){
  3933   3393         rc = SQLITE_NOMEM;
  3934   3394       }else{
  3935   3395         pPager->dbOrigSize = pPager->dbSize;
  3936   3396         rc = writeJournalHdr(pPager);
  3937   3397       }
  3938   3398     }
  3939         -  assert( !isOpen(pPager->jfd) || pPager->journalOff>0 || rc!=SQLITE_OK );
         3399  +  assert( !pPager->journalOpen || pPager->journalOff>0 || rc!=SQLITE_OK );
  3940   3400     return rc;
  3941   3401   }
  3942   3402   
  3943   3403   /*
  3944   3404   ** Mark a data page as writeable.  The page is written into the journal 
  3945   3405   ** if it is not there already.  This routine must be called before making
  3946   3406   ** changes to a page.
................................................................................
  3970   3430     if( pPager->readOnly ){
  3971   3431       return SQLITE_PERM;
  3972   3432     }
  3973   3433   
  3974   3434     assert( !pPager->setMaster );
  3975   3435   
  3976   3436     CHECK_PAGE(pPg);
         3437  +
         3438  +  /* If this page was previously acquired with noContent==1, that means
         3439  +  ** we didn't really read in the content of the page.  This can happen
         3440  +  ** (for example) when the page is being moved to the freelist.  But
         3441  +  ** now we are (perhaps) moving the page off of the freelist for
         3442  +  ** reuse and we need to know its original content so that content
         3443  +  ** can be stored in the rollback journal.  So do the read at this
         3444  +  ** time.
         3445  +  */
         3446  +  rc = pager_get_content(pPg);
         3447  +  if( rc ){
         3448  +    return rc;
         3449  +  }
  3977   3450   
  3978   3451     /* Mark the page as dirty.  If the page has already been written
  3979   3452     ** to the journal then we can return right away.
  3980   3453     */
  3981   3454     sqlite3PcacheMakeDirty(pPg);
  3982   3455     if( pageInJournal(pPg) && !subjRequiresPage(pPg) ){
         3456  +    pPager->dirtyCache = 1;
  3983   3457       pPager->dbModified = 1;
  3984   3458     }else{
  3985   3459   
  3986   3460       /* If we get this far, it means that the page needs to be
  3987   3461       ** written to the transaction journal or the ckeckpoint journal
  3988   3462       ** or both.
  3989   3463       **
................................................................................
  3992   3466       */
  3993   3467       assert( pPager->state!=PAGER_UNLOCK );
  3994   3468       rc = sqlite3PagerBegin(pPg, 0);
  3995   3469       if( rc!=SQLITE_OK ){
  3996   3470         return rc;
  3997   3471       }
  3998   3472       assert( pPager->state>=PAGER_RESERVED );
  3999         -    if( !isOpen(pPager->jfd) && pPager->useJournal
         3473  +    if( !pPager->journalOpen && pPager->useJournal
  4000   3474             && pPager->journalMode!=PAGER_JOURNALMODE_OFF ){
  4001   3475         rc = pager_open_journal(pPager);
  4002   3476         if( rc!=SQLITE_OK ) return rc;
  4003   3477       }
         3478  +    pPager->dirtyCache = 1;
  4004   3479       pPager->dbModified = 1;
  4005   3480     
  4006   3481       /* The transaction journal now exists and we have a RESERVED or an
  4007   3482       ** EXCLUSIVE lock on the main database file.  Write the current page to
  4008   3483       ** the transaction journal if it is not there already.
  4009   3484       */
  4010         -    if( !pageInJournal(pPg) && isOpen(pPager->jfd) ){
         3485  +    if( !pageInJournal(pPg) && pPager->journalOpen ){
  4011   3486         if( pPg->pgno<=pPager->dbOrigSize ){
  4012   3487           u32 cksum;
  4013   3488           char *pData2;
  4014   3489   
  4015   3490           /* We should never write to the journal file the page that
  4016   3491           ** contains the database locks.  The following assert verifies
  4017   3492           ** that we do not. */
................................................................................
  4216   3691   ** The overlying software layer calls this routine when all of the data
  4217   3692   ** on the given page is unused.  The pager marks the page as clean so
  4218   3693   ** that it does not get written to disk.
  4219   3694   **
  4220   3695   ** Tests show that this optimization, together with the
  4221   3696   ** sqlite3PagerDontRollback() below, more than double the speed
  4222   3697   ** of large INSERT operations and quadruple the speed of large DELETEs.
         3698  +**
         3699  +** When this routine is called, set the bit corresponding to pDbPage in
         3700  +** the Pager.pAlwaysRollback bitvec.  Subsequent calls to
         3701  +** sqlite3PagerDontRollback() for the same page will thereafter be ignored.
         3702  +** This is necessary to avoid a problem where a page with data is added to
         3703  +** the freelist during one part of a transaction then removed from the
         3704  +** freelist during a later part of the same transaction and reused for some
         3705  +** other purpose.  When it is first added to the freelist, this routine is
         3706  +** called.  When reused, the sqlite3PagerDontRollback() routine is called.
         3707  +** But because the page contains critical data, we still need to be sure it
         3708  +** gets rolled back in spite of the sqlite3PagerDontRollback() call.
  4223   3709   */
  4224         -void sqlite3PagerDontWrite(PgHdr *pPg){
         3710  +int sqlite3PagerDontWrite(DbPage *pDbPage){
         3711  +  PgHdr *pPg = pDbPage;
  4225   3712     Pager *pPager = pPg->pPager;
  4226         -  if( (pPg->flags&PGHDR_DIRTY) && pPager->nSavepoint==0 ){
  4227         -    PAGERTRACE(("DONT_WRITE page %d of %d\n", pPg->pgno, PAGERID(pPager)));
  4228         -    IOTRACE(("CLEAN %p %d\n", pPager, pPg->pgno))
  4229         -    pPg->flags |= PGHDR_DONT_WRITE;
         3713  +  int rc;
         3714  +
         3715  +  if( pPg->pgno>pPager->dbOrigSize ){
         3716  +    return SQLITE_OK;
         3717  +  }
         3718  +  if( pPager->pAlwaysRollback==0 ){
         3719  +    assert( pPager->pInJournal );
         3720  +    pPager->pAlwaysRollback = sqlite3BitvecCreate(pPager->dbOrigSize);
         3721  +    if( !pPager->pAlwaysRollback ){
         3722  +      return SQLITE_NOMEM;
         3723  +    }
         3724  +  }
         3725  +  rc = sqlite3BitvecSet(pPager->pAlwaysRollback, pPg->pgno);
         3726  +
         3727  +  if( rc==SQLITE_OK && (pPg->flags&PGHDR_DIRTY) && pPager->nSavepoint==0 ){
         3728  +    assert( pPager->state>=PAGER_SHARED );
         3729  +    if( pPager->dbSize==pPg->pgno && pPager->dbOrigSize<pPager->dbSize ){
         3730  +      /* If this pages is the last page in the file and the file has grown
         3731  +      ** during the current transaction, then do NOT mark the page as clean.
         3732  +      ** When the database file grows, we must make sure that the last page
         3733  +      ** gets written at least once so that the disk file will be the correct
         3734  +      ** size. If you do not write this page and the size of the file
         3735  +      ** on the disk ends up being too small, that can lead to database
         3736  +      ** corruption during the next transaction.
         3737  +      */
         3738  +    }else{
         3739  +      PAGERTRACE(("DONT_WRITE page %d of %d\n", pPg->pgno, PAGERID(pPager)));
         3740  +      IOTRACE(("CLEAN %p %d\n", pPager, pPg->pgno))
         3741  +      pPg->flags |= PGHDR_DONT_WRITE;
  4230   3742   #ifdef SQLITE_CHECK_PAGES
  4231         -    pPg->pageHash = pager_pagehash(pPg);
         3743  +      pPg->pageHash = pager_pagehash(pPg);
         3744  +#endif
         3745  +    }
         3746  +  }
         3747  +  return rc;
         3748  +}
         3749  +
         3750  +/*
         3751  +** A call to this routine tells the pager that if a rollback occurs,
         3752  +** it is not necessary to restore the data on the given page.  This
         3753  +** means that the pager does not have to record the given page in the
         3754  +** rollback journal.
         3755  +**
         3756  +** If we have not yet actually read the content of this page (if
         3757  +** the PgHdr.needRead flag is set) then this routine acts as a promise
         3758  +** that we will never need to read the page content in the future.
         3759  +** so the needRead flag can be cleared at this point.
         3760  +*/
         3761  +void sqlite3PagerDontRollback(DbPage *pPg){
         3762  +  Pager *pPager = pPg->pPager;
         3763  +  TESTONLY( int rc; )  /* Return value from sqlite3BitvecSet() */
         3764  +
         3765  +  assert( pPager->state>=PAGER_RESERVED );
         3766  +
         3767  +  /* If the journal file is not open, or DontWrite() has been called on
         3768  +  ** this page (DontWrite() sets the Pager.pAlwaysRollback bit), then this
         3769  +  ** function is a no-op.
         3770  +  */
         3771  +  if( pPager->journalOpen==0 
         3772  +   || sqlite3BitvecTest(pPager->pAlwaysRollback, pPg->pgno)
         3773  +   || pPg->pgno>pPager->dbOrigSize
         3774  +  ){
         3775  +    return;
         3776  +  }
         3777  +
         3778  +#ifdef SQLITE_SECURE_DELETE
         3779  +  if( sqlite3BitvecTest(pPager->pInJournal, pPg->pgno)!=0
         3780  +   || pPg->pgno>pPager->dbOrigSize ){
         3781  +    return;
         3782  +  }
  4232   3783   #endif
  4233         -  }
         3784  +
         3785  +  /* If SECURE_DELETE is disabled, then there is no way that this
         3786  +  ** routine can be called on a page for which sqlite3PagerDontWrite()
         3787  +  ** has not been previously called during the same transaction.
         3788  +  ** And if DontWrite() has previously been called, the following
         3789  +  ** conditions must be met.
         3790  +  **
         3791  +  ** (Later:)  Not true.  If the database is corrupted by having duplicate
         3792  +  ** pages on the freelist (ex: corrupt9.test) then the following is not
         3793  +  ** necessarily true:
         3794  +  */
         3795  +  /* assert( !pPg->inJournal && (int)pPg->pgno <= pPager->dbOrigSize ); */
         3796  +
         3797  +  assert( pPager->pInJournal!=0 );
         3798  +  pPg->flags &= ~PGHDR_NEED_READ;
         3799  +
         3800  +  /* Failure to set the bits in the InJournal bit-vectors is benign.
         3801  +  ** It merely means that we might do some extra work to journal a page
         3802  +  ** that does not need to be journaled.  Nevertheless, be sure to test the
         3803  +  ** case where a malloc error occurs while trying to set a bit in a 
         3804  +  ** bit vector.
         3805  +  */
         3806  +  sqlite3BeginBenignMalloc();
         3807  +  TESTONLY( rc = ) sqlite3BitvecSet(pPager->pInJournal, pPg->pgno);
         3808  +  testcase( rc==SQLITE_NOMEM );
         3809  +  TESTONLY( rc = ) addToSavepointBitvecs(pPager, pPg->pgno);
         3810  +  testcase( rc==SQLITE_NOMEM );
         3811  +  sqlite3EndBenignMalloc();
         3812  +
         3813  +
         3814  +  PAGERTRACE(("DONT_ROLLBACK page %d of %d\n", pPg->pgno, PAGERID(pPager)));
         3815  +  IOTRACE(("GARBAGE %p %d\n", pPager, pPg->pgno))
  4234   3816   }
         3817  +
  4235   3818   
  4236   3819   /*
  4237   3820   ** This routine is called to increment the database file change-counter,
  4238   3821   ** stored at byte 24 of the pager file.
  4239   3822   */
  4240   3823   static int pager_incr_changecounter(Pager *pPager, int isDirect){
  4241   3824     PgHdr *pPgHdr;
................................................................................
  4260   3843   
  4261   3844       /* Increment the value just read and write it back to byte 24. */
  4262   3845       change_counter = sqlite3Get4byte((u8*)pPager->dbFileVers);
  4263   3846       change_counter++;
  4264   3847       put32bits(((char*)pPgHdr->pData)+24, change_counter);
  4265   3848   
  4266   3849   #ifdef SQLITE_ENABLE_ATOMIC_WRITE
  4267         -    if( isDirect && isOpen(pPager->fd) ){
         3850  +    if( isDirect && pPager->fd->pMethods ){
  4268   3851         const void *zBuf = pPgHdr->pData;
  4269   3852         assert( pPager->dbFileSize>0 );
  4270   3853         rc = sqlite3OsWrite(pPager->fd, zBuf, pPager->pageSize, 0);
  4271   3854       }
  4272   3855   #endif
  4273   3856   
  4274   3857       /* Release the page reference. */
................................................................................
  4321   3904       return pPager->errCode;
  4322   3905     }
  4323   3906   
  4324   3907     /* If no changes have been made, we can leave the transaction early.
  4325   3908     */
  4326   3909     if( pPager->dbModified==0 &&
  4327   3910           (pPager->journalMode!=PAGER_JOURNALMODE_DELETE ||
  4328         -          pPager->exclusiveMode) ){
  4329         -    assert( pPager->dbModified==0 || !isOpen(pPager->jfd) );
         3911  +          pPager->exclusiveMode!=0) ){
         3912  +    assert( pPager->dirtyCache==0 || pPager->journalOpen==0 );
  4330   3913       return SQLITE_OK;
  4331   3914     }
  4332   3915   
  4333   3916     PAGERTRACE(("DATABASE SYNC: File=%s zMaster=%s nSize=%d\n", 
  4334   3917         pPager->zFilename, zMaster, pPager->dbSize));
  4335   3918   
  4336   3919     /* If this is an in-memory db, or no pages have been written to, or this
  4337   3920     ** function has already been called, it is a no-op.
  4338   3921     */
  4339         -  if( pPager->state!=PAGER_SYNCED && !MEMDB && pPager->dbModified ){
         3922  +  if( pPager->state!=PAGER_SYNCED && !MEMDB && pPager->dirtyCache ){
  4340   3923       PgHdr *pPg;
  4341   3924   
  4342   3925   #ifdef SQLITE_ENABLE_ATOMIC_WRITE
  4343   3926       /* The atomic-write optimization can be used if all of the
  4344   3927       ** following are true:
  4345   3928       **
  4346   3929       **    + The file-system supports the atomic-write property for
................................................................................
  4351   3934       ** If the optimization can be used, then the journal file will never
  4352   3935       ** be created for this transaction.
  4353   3936       */
  4354   3937       int useAtomicWrite;
  4355   3938       pPg = sqlite3PcacheDirtyList(pPager->pPCache);
  4356   3939       useAtomicWrite = (
  4357   3940           !zMaster && 
  4358         -        isOpen(pPager->jfd) &&
         3941  +        pPager->journalOpen &&
  4359   3942           pPager->journalOff==jrnlBufferSize(pPager) && 
  4360   3943           pPager->dbSize>=pPager->dbFileSize && 
  4361   3944           (pPg==0 || pPg->pDirty==0)
  4362   3945       );
  4363         -    assert( isOpen(pPager->jfd) || pPager->journalMode==PAGER_JOURNALMODE_OFF );
         3946  +    assert( pPager->journalOpen || pPager->journalMode==PAGER_JOURNALMODE_OFF );
  4364   3947       if( useAtomicWrite ){
  4365   3948         /* Update the nRec field in the journal file. */
  4366   3949         int offset = pPager->journalHdr + sizeof(aJournalMagic);
  4367   3950         assert(pPager->nRec==1);
  4368   3951         rc = write32bits(pPager->jfd, offset, pPager->nRec);
  4369   3952   
  4370   3953         /* Update the db file change counter. The following call will modify
................................................................................
  4434   4017         ** pager_get_all_dirty_pages() that verifies that no attempt
  4435   4018         ** is made to use an invalid dirty list.
  4436   4019         */
  4437   4020         goto sync_exit;
  4438   4021       }
  4439   4022       sqlite3PcacheCleanAll(pPager->pPCache);
  4440   4023   
  4441         -    if( pPager->dbSize!=pPager->dbFileSize ){
  4442         -      Pgno nNew = pPager->dbSize - (pPager->dbSize==PAGER_MJ_PGNO(pPager));
         4024  +    if( pPager->dbSize<pPager->dbFileSize ){
  4443   4025         assert( pPager->state>=PAGER_EXCLUSIVE );
  4444         -      rc = pager_truncate(pPager, nNew);
         4026  +      rc = pager_truncate(pPager, pPager->dbSize);
  4445   4027         if( rc!=SQLITE_OK ) goto sync_exit;
  4446   4028       }
  4447   4029   
  4448   4030       /* Sync the database file. */
  4449   4031       if( !pPager->noSync && !noSync ){
  4450   4032         rc = sqlite3OsSync(pPager->fd, pPager->sync_flags);
  4451   4033       }
................................................................................
  4482   4064     }
  4483   4065     if( pPager->state<PAGER_RESERVED ){
  4484   4066       return SQLITE_ERROR;
  4485   4067     }
  4486   4068     if( pPager->dbModified==0 &&
  4487   4069           (pPager->journalMode!=PAGER_JOURNALMODE_DELETE ||
  4488   4070             pPager->exclusiveMode!=0) ){
  4489         -    assert( pPager->dbModified==0 || isOpen(pPager->jfd)==0 );
         4071  +    assert( pPager->dirtyCache==0 || pPager->journalOpen==0 );
  4490   4072       return SQLITE_OK;
  4491   4073     }
  4492   4074     PAGERTRACE(("COMMIT %d\n", PAGERID(pPager)));
  4493         -  assert( pPager->state==PAGER_SYNCED || MEMDB || !pPager->dbModified );
         4075  +  assert( pPager->state==PAGER_SYNCED || MEMDB || !pPager->dirtyCache );
  4494   4076     rc = pager_end_transaction(pPager, pPager->setMaster);
  4495   4077     rc = pager_error(pPager, rc);
  4496   4078     return rc;
  4497   4079   }
  4498   4080   
  4499   4081   /*
  4500   4082   ** Rollback all changes.  The database falls back to PAGER_SHARED mode.
................................................................................
  4507   4089   ** unless a prior malloc() failed (SQLITE_NOMEM).  Appropriate error
  4508   4090   ** codes are returned for all these occasions.  Otherwise,
  4509   4091   ** SQLITE_OK is returned.
  4510   4092   */
  4511   4093   int sqlite3PagerRollback(Pager *pPager){
  4512   4094     int rc = SQLITE_OK;
  4513   4095     PAGERTRACE(("ROLLBACK %d\n", PAGERID(pPager)));
  4514         -  if( !pPager->dbModified || !isOpen(pPager->jfd) ){
         4096  +  if( !pPager->dirtyCache || !pPager->journalOpen ){
  4515   4097       rc = pager_end_transaction(pPager, pPager->setMaster);
  4516   4098     }else if( pPager->errCode && pPager->errCode!=SQLITE_FULL ){
  4517   4099       if( pPager->state>=PAGER_EXCLUSIVE ){
  4518   4100         pager_playback(pPager, 0);
  4519   4101       }
  4520   4102       rc = pPager->errCode;
  4521   4103     }else{
................................................................................
  4597   4179   
  4598   4180     if( nSavepoint>pPager->nSavepoint && pPager->useJournal ){
  4599   4181       int ii;
  4600   4182       PagerSavepoint *aNew;
  4601   4183   
  4602   4184       /* Either there is no active journal or the sub-journal is open or 
  4603   4185       ** the journal is always stored in memory */
  4604         -    assert( pPager->nSavepoint==0 || isOpen(pPager->sjfd) ||
         4186  +    assert( pPager->nSavepoint==0 || pPager->sjfd->pMethods ||
  4605   4187               pPager->journalMode==PAGER_JOURNALMODE_MEMORY );
  4606   4188   
  4607   4189       /* Grow the Pager.aSavepoint array using realloc(). Return SQLITE_NOMEM
  4608   4190       ** if the allocation fails. Otherwise, zero the new portion in case a 
  4609   4191       ** malloc failure occurs while populating it in the for(...) loop below.
  4610   4192       */
  4611   4193       aNew = (PagerSavepoint *)sqlite3Realloc(
................................................................................
  4621   4203       ii = pPager->nSavepoint;
  4622   4204       pPager->nSavepoint = nSavepoint;
  4623   4205   
  4624   4206       /* Populate the PagerSavepoint structures just allocated. */
  4625   4207       for(/* no-op */; ii<nSavepoint; ii++){
  4626   4208         assert( pPager->dbSizeValid );
  4627   4209         aNew[ii].nOrig = pPager->dbSize;
  4628         -      if( isOpen(pPager->jfd) && pPager->journalOff>0 ){
         4210  +      if( pPager->journalOpen && pPager->journalOff>0 ){
  4629   4211           aNew[ii].iOffset = pPager->journalOff;
  4630   4212         }else{
  4631   4213           aNew[ii].iOffset = JOURNAL_HDR_SZ(pPager);
  4632   4214         }
  4633         -      aNew[ii].iSubRec = pPager->nSubRec;
         4215  +      aNew[ii].iSubRec = pPager->stmtNRec;
  4634   4216         aNew[ii].pInSavepoint = sqlite3BitvecCreate(pPager->dbSize);
  4635   4217         if( !aNew[ii].pInSavepoint ){
  4636   4218           return SQLITE_NOMEM;
  4637   4219         }
  4638   4220       }
  4639   4221   
  4640   4222       /* Open the sub-journal, if it is not already opened. */
................................................................................
  4665   4247       int ii;
  4666   4248       int nNew = iSavepoint + (op==SAVEPOINT_ROLLBACK);
  4667   4249       for(ii=nNew; ii<pPager->nSavepoint; ii++){
  4668   4250         sqlite3BitvecDestroy(pPager->aSavepoint[ii].pInSavepoint);
  4669   4251       }
  4670   4252       pPager->nSavepoint = nNew;
  4671   4253   
  4672         -    if( op==SAVEPOINT_ROLLBACK && isOpen(pPager->jfd) ){
         4254  +    if( op==SAVEPOINT_ROLLBACK && pPager->jfd->pMethods ){
  4673   4255         PagerSavepoint *pSavepoint = (nNew==0)?0:&pPager->aSavepoint[nNew-1];
  4674   4256         rc = pagerPlaybackSavepoint(pPager, pSavepoint);
  4675   4257         assert(rc!=SQLITE_DONE);
  4676   4258       }
  4677   4259     
  4678   4260       /* If this is a release of the outermost savepoint, truncate 
  4679   4261       ** the sub-journal. */
  4680         -    if( nNew==0 && op==SAVEPOINT_RELEASE && isOpen(pPager->sjfd) ){
         4262  +    if( nNew==0 && op==SAVEPOINT_RELEASE && pPager->sjfd->pMethods ){
  4681   4263         assert( rc==SQLITE_OK );
  4682   4264         rc = sqlite3OsTruncate(pPager->sjfd, 0);
  4683         -      pPager->nSubRec = 0;
         4265  +      pPager->stmtNRec = 0;
  4684   4266       }
  4685   4267     }
  4686   4268     return rc;
  4687   4269   }
  4688   4270   
  4689   4271   /*
  4690   4272   ** Return the full pathname of the database file.
................................................................................
  4704   4286   ** Return the file handle for the database file associated
  4705   4287   ** with the pager.  This might return NULL if the file has
  4706   4288   ** not yet been opened.
  4707   4289   */
  4708   4290   sqlite3_file *sqlite3PagerFile(Pager *pPager){
  4709   4291     return pPager->fd;
  4710   4292   }
         4293  +
         4294  +/*
         4295  +** Return the directory of the database file.
         4296  +*/
         4297  +const char *sqlite3PagerDirname(Pager *pPager){
         4298  +  return pPager->zDirectory;
         4299  +}
  4711   4300   
  4712   4301   /*
  4713   4302   ** Return the full pathname of the journal file.
  4714   4303   */
  4715   4304   const char *sqlite3PagerJournalname(Pager *pPager){
  4716   4305     return pPager->zJournal;
  4717   4306   }
................................................................................
  4788   4377     ){
  4789   4378       return rc;
  4790   4379     }
  4791   4380   
  4792   4381     PAGERTRACE(("MOVE %d page %d (needSync=%d) moves to %d\n", 
  4793   4382         PAGERID(pPager), pPg->pgno, (pPg->flags&PGHDR_NEED_SYNC)?1:0, pgno));
  4794   4383     IOTRACE(("MOVE %p %d %d\n", pPager, pPg->pgno, pgno))
         4384  +
         4385  +  pager_get_content(pPg);
  4795   4386   
  4796   4387     /* If the journal needs to be sync()ed before page pPg->pgno can
  4797   4388     ** be written to, store pPg->pgno in local variable needSyncPgno.
  4798   4389     **
  4799   4390     ** If the isCommit flag is set, there is no need to remember that
  4800   4391     ** the journal needs to be sync()ed before database page pPg->pgno 
  4801   4392     ** can be written to. The caller has already promised not to write to it.
................................................................................
  4821   4412   
  4822   4413     sqlite3PcacheMove(pPg, pgno);
  4823   4414     if( pPgOld ){
  4824   4415       sqlite3PcacheDrop(pPgOld);
  4825   4416     }
  4826   4417   
  4827   4418     sqlite3PcacheMakeDirty(pPg);
         4419  +  pPager->dirtyCache = 1;
  4828   4420     pPager->dbModified = 1;
  4829   4421   
  4830   4422     if( needSyncPgno ){
  4831   4423       /* If needSyncPgno is non-zero, then the journal file needs to be 
  4832   4424       ** sync()ed before any data is written to database file page needSyncPgno.
  4833   4425       ** Currently, no such page exists in the page-cache and the 
  4834   4426       ** "is journaled" bitvec flag has been set. This needs to be remedied by
................................................................................
  4912   4504   **    PAGER_JOURNALMODE_TRUNCATE
  4913   4505   **    PAGER_JOURNALMODE_PERSIST
  4914   4506   **    PAGER_JOURNALMODE_OFF
  4915   4507   **
  4916   4508   ** If the parameter is not _QUERY, then the journal-mode is set to the
  4917   4509   ** value specified.
  4918   4510   **
  4919         -** The returned indicate the current (possibly updated) journal-mode.
         4511  +** The returned indicate the current (possibly updated)
         4512  +** journal-mode.
  4920   4513   */
  4921   4514   int sqlite3PagerJournalMode(Pager *pPager, int eMode){
  4922   4515     if( !MEMDB ){
  4923   4516       assert( eMode==PAGER_JOURNALMODE_QUERY
  4924   4517                 || eMode==PAGER_JOURNALMODE_DELETE
  4925   4518                 || eMode==PAGER_JOURNALMODE_TRUNCATE
  4926   4519                 || eMode==PAGER_JOURNALMODE_PERSIST

Changes to src/pager.h.

     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12     12   ** This header file defines the interface that the sqlite page cache
    13     13   ** subsystem.  The page cache subsystem reads and writes a file a page
    14     14   ** at a time and provides a journal for rollback.
    15     15   **
    16         -** @(#) $Id: pager.h,v 1.94 2009/01/16 15:21:06 danielk1977 Exp $
           16  +** @(#) $Id: pager.h,v 1.95 2009/01/16 16:23:38 danielk1977 Exp $
    17     17   */
    18     18   
    19     19   #ifndef _PAGER_H_
    20     20   #define _PAGER_H_
    21     21   
    22     22   /*
    23         -** Default maximum size for persistent journal files. A negative 
    24         -** value means no limit. This value may be overridden using the 
    25         -** sqlite3PagerJournalSizeLimit() API. See also "PRAGMA journal_size_limit".
           23  +** If defined as non-zero, auto-vacuum is enabled by default. Otherwise
           24  +** it must be turned on for each database using "PRAGMA auto_vacuum = 1".
    26     25   */
    27     26   #ifndef SQLITE_DEFAULT_JOURNAL_SIZE_LIMIT
    28     27     #define SQLITE_DEFAULT_JOURNAL_SIZE_LIMIT -1
    29     28   #endif
    30     29   
    31     30   /*
    32     31   ** The type used to represent a page number.  The first page in a file
................................................................................
    40     39   typedef struct Pager Pager;
    41     40   
    42     41   /*
    43     42   ** Handle type for pages.
    44     43   */
    45     44   typedef struct PgHdr DbPage;
    46     45   
    47         -/*
    48         -** Page number PAGER_MJ_PGNO is never used in an SQLite database (it is
    49         -** reserved for working around a windows/posix incompatibility). It is
    50         -** used in the journal to signify that the remainder of the journal file 
    51         -** is devoted to storing a master journal name - there are no more pages to
    52         -** roll back. See comments for function writeMasterJournal() in pager.c 
    53         -** for details.
    54         -*/
    55         -#define PAGER_MJ_PGNO(x) ((Pgno)((PENDING_BYTE/((x)->pageSize))+1))
    56         -
    57     46   /*
    58     47   ** Allowed values for the flags parameter to sqlite3PagerOpen().
    59     48   **
    60         -** NOTE: These values must match the corresponding BTREE_ values in btree.h.
           49  +** NOTE: This values must match the corresponding BTREE_ values in btree.h.
    61     50   */
    62     51   #define PAGER_OMIT_JOURNAL  0x0001    /* Do not use a rollback journal */
    63     52   #define PAGER_NO_READLOCK   0x0002    /* Omit readlocks on readonly files */
    64     53   
    65     54   /*
    66     55   ** Valid values for the second argument to sqlite3PagerLockingMode().
    67     56   */
................................................................................
    76     65   #define PAGER_JOURNALMODE_DELETE      0   /* Commit by deleting journal file */
    77     66   #define PAGER_JOURNALMODE_PERSIST     1   /* Commit by zeroing journal header */
    78     67   #define PAGER_JOURNALMODE_OFF         2   /* Journal omitted.  */
    79     68   #define PAGER_JOURNALMODE_TRUNCATE    3   /* Commit by truncating journal */
    80     69   #define PAGER_JOURNALMODE_MEMORY      4   /* In-memory journal file */
    81     70   
    82     71   /*
    83         -** The remainder of this file contains the declarations of the functions
    84         -** that make up the Pager sub-system API. See source code comments for 
    85         -** a detailed description of each routine.
           72  +** See source code comments for a detailed description of the following
           73  +** routines:
    86     74   */
    87         -
    88         -/* Open and close a Pager connection. */ 
    89     75   int sqlite3PagerOpen(sqlite3_vfs *, Pager **ppPager, const char*, int,int,int);
    90         -int sqlite3PagerClose(Pager *pPager);
    91         -
    92         -/* Functions used to configure a Pager object. */
    93     76   void sqlite3PagerSetBusyhandler(Pager*, int(*)(void *), void *);
    94     77   void sqlite3PagerSetReiniter(Pager*, void(*)(DbPage*));
    95     78   int sqlite3PagerSetPagesize(Pager*, u16*);
    96     79   int sqlite3PagerMaxPageCount(Pager*, int);
    97     80   int sqlite3PagerReadFileheader(Pager*, int, unsigned char*);
    98     81   void sqlite3PagerSetCachesize(Pager*, int);
    99         -void sqlite3PagerSetSafetyLevel(Pager*,int,int);
   100         -int sqlite3PagerLockingMode(Pager *, int);
   101         -int sqlite3PagerJournalMode(Pager *, int);
   102         -i64 sqlite3PagerJournalSizeLimit(Pager *, i64);
   103         -
   104         -/* Functions used to obtain and release page references. */ 
           82  +int sqlite3PagerClose(Pager *pPager);
   105     83   int sqlite3PagerAcquire(Pager *pPager, Pgno pgno, DbPage **ppPage, int clrFlag);
   106     84   #define sqlite3PagerGet(A,B,C) sqlite3PagerAcquire(A,B,C,0)
   107     85   DbPage *sqlite3PagerLookup(Pager *pPager, Pgno pgno);
   108         -void sqlite3PagerRef(DbPage*);
   109         -void sqlite3PagerUnref(DbPage*);
   110         -
   111         -/* Operations on page references. */
   112         -int sqlite3PagerWrite(DbPage*);
   113         -void sqlite3PagerDontWrite(DbPage*);
   114         -int sqlite3PagerMovepage(Pager*,DbPage*,Pgno,int);
   115     86   int sqlite3PagerPageRefcount(DbPage*);
   116         -void *sqlite3PagerGetData(DbPage *); 
   117         -void *sqlite3PagerGetExtra(DbPage *); 
   118         -
   119         -/* Functions used to manage pager transactions and savepoints. */
           87  +int sqlite3PagerRef(DbPage*);
           88  +int sqlite3PagerUnref(DbPage*);
           89  +int sqlite3PagerWrite(DbPage*);
   120     90   int sqlite3PagerPagecount(Pager*, int*);
   121     91   int sqlite3PagerBegin(DbPage*, int exFlag);
   122     92   int sqlite3PagerCommitPhaseOne(Pager*,const char *zMaster, int);
   123         -int sqlite3PagerSync(Pager *pPager);
   124     93   int sqlite3PagerCommitPhaseTwo(Pager*);
   125     94   int sqlite3PagerRollback(Pager*);
   126         -int sqlite3PagerOpenSavepoint(Pager *pPager, int n);
   127         -int sqlite3PagerSavepoint(Pager *pPager, int op, int iSavepoint);
   128         -
   129         -/* Functions used to query pager state and configuration. */
   130     95   u8 sqlite3PagerIsreadonly(Pager*);
           96  +void sqlite3PagerDontRollback(DbPage*);
           97  +int sqlite3PagerDontWrite(DbPage*);
   131     98   int sqlite3PagerRefcount(Pager*);
           99  +void sqlite3PagerSetSafetyLevel(Pager*,int,int);
   132    100   const char *sqlite3PagerFilename(Pager*);
   133    101   const sqlite3_vfs *sqlite3PagerVfs(Pager*);
   134    102   sqlite3_file *sqlite3PagerFile(Pager*);
   135    103   const char *sqlite3PagerDirname(Pager*);
   136    104   const char *sqlite3PagerJournalname(Pager*);
   137    105   int sqlite3PagerNosync(Pager*);
          106  +int sqlite3PagerMovepage(Pager*,DbPage*,Pgno,int);
          107  +void *sqlite3PagerGetData(DbPage *); 
          108  +void *sqlite3PagerGetExtra(DbPage *); 
          109  +int sqlite3PagerLockingMode(Pager *, int);
          110  +int sqlite3PagerJournalMode(Pager *, int);
          111  +i64 sqlite3PagerJournalSizeLimit(Pager *, i64);
   138    112   void *sqlite3PagerTempSpace(Pager*);
          113  +int sqlite3PagerSync(Pager *pPager);
   139    114   
   140         -/* Functions used in auto-vacuum mode to truncate the database file. */
          115  +int sqlite3PagerOpenSavepoint(Pager *pPager, int n);
          116  +int sqlite3PagerSavepoint(Pager *pPager, int op, int iSavepoint);
          117  +
   141    118   #ifndef SQLITE_OMIT_AUTOVACUUM
   142    119     void sqlite3PagerTruncateImage(Pager*,Pgno);
          120  +  Pgno sqlite3PagerImageSize(Pager *);
   143    121   #endif
   144    122   
   145         -/* Used by encryption extensions. */
   146    123   #ifdef SQLITE_HAS_CODEC
   147    124     void sqlite3PagerSetCodec(Pager*,void*(*)(void*,void*,Pgno,int),void*);
   148    125   #endif
   149    126   
   150         -/* Functions to support testing and debugging. */
   151    127   #if !defined(NDEBUG) || defined(SQLITE_TEST)
   152    128     Pgno sqlite3PagerPagenumber(DbPage*);
   153    129     int sqlite3PagerIswriteable(DbPage*);
   154    130   #endif
          131  +
   155    132   #ifdef SQLITE_TEST
   156    133     int *sqlite3PagerStats(Pager*);
   157    134     void sqlite3PagerRefdump(Pager*);
   158    135     int sqlite3PagerIsMemdb(Pager*);
   159         -  void disable_simulated_io_errors(void);
   160         -  void enable_simulated_io_errors(void);
          136  +#endif
          137  +
          138  +#ifdef SQLITE_TEST
          139  +void disable_simulated_io_errors(void);
          140  +void enable_simulated_io_errors(void);
   161    141   #else
   162    142   # define disable_simulated_io_errors()
   163    143   # define enable_simulated_io_errors()
   164    144   #endif
   165    145   
   166    146   #endif /* _PAGER_H_ */

Changes to src/pcache.c.

     7      7   **    May you do good and not evil.
     8      8   **    May you find forgiveness for yourself and forgive others.
     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12     12   ** This file implements that page cache.
    13     13   **
    14         -** @(#) $Id: pcache.c,v 1.40 2009/01/16 15:21:06 danielk1977 Exp $
           14  +** @(#) $Id: pcache.c,v 1.41 2009/01/16 16:23:38 danielk1977 Exp $
    15     15   */
    16     16   #include "sqliteInt.h"
    17     17   
    18     18   /*
    19     19   ** A complete page cache is an instance of this structure.
    20     20   */
    21     21   struct PCache {
................................................................................
   427    427       sqlite3GlobalConfig.pcache.xDestroy(pCache->pCache);
   428    428     }
   429    429   }
   430    430   
   431    431   /* 
   432    432   ** Discard the contents of the cache.
   433    433   */
   434         -void sqlite3PcacheClear(PCache *pCache){
          434  +int sqlite3PcacheClear(PCache *pCache){
   435    435     sqlite3PcacheTruncate(pCache, 0);
          436  +  return SQLITE_OK;
   436    437   }
   437    438   
   438    439   /*
   439    440   ** Merge two lists of pages connected by pDirty and in pgno order.
   440    441   ** Do not both fixing the pDirtyPrev pointers.
   441    442   */
   442    443   static PgHdr *pcacheMergeDirtyList(PgHdr *pA, PgHdr *pB){

Changes to src/pcache.h.

     8      8   **    May you find forgiveness for yourself and forgive others.
     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12     12   ** This header file defines the interface that the sqlite page cache
    13     13   ** subsystem. 
    14     14   **
    15         -** @(#) $Id: pcache.h,v 1.17 2009/01/16 15:21:06 danielk1977 Exp $
           15  +** @(#) $Id: pcache.h,v 1.18 2009/01/16 16:23:38 danielk1977 Exp $
    16     16   */
    17     17   
    18     18   #ifndef _PCACHE_H_
    19     19   
    20     20   typedef struct PgHdr PgHdr;
    21     21   typedef struct PCache PCache;
    22     22   
................................................................................
   107    107   /* Reset and close the cache object */
   108    108   void sqlite3PcacheClose(PCache*);
   109    109   
   110    110   /* Clear flags from pages of the page cache */
   111    111   void sqlite3PcacheClearSyncFlags(PCache *);
   112    112   
   113    113   /* Discard the contents of the cache */
   114         -void sqlite3PcacheClear(PCache*);
          114  +int sqlite3PcacheClear(PCache*);
   115    115   
   116    116   /* Return the total number of outstanding page references */
   117    117   int sqlite3PcacheRefCount(PCache*);
   118    118   
   119    119   /* Increment the reference count of an existing page */
   120    120   void sqlite3PcacheRef(PgHdr*);
   121    121   

Changes to src/sqliteInt.h.

     7      7   **    May you do good and not evil.
     8      8   **    May you find forgiveness for yourself and forgive others.
     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12     12   ** Internal interface definitions for SQLite.
    13     13   **
    14         -** @(#) $Id: sqliteInt.h,v 1.826 2009/01/16 15:21:06 danielk1977 Exp $
           14  +** @(#) $Id: sqliteInt.h,v 1.827 2009/01/16 16:23:38 danielk1977 Exp $
    15     15   */
    16     16   #ifndef _SQLITEINT_H_
    17     17   #define _SQLITEINT_H_
    18     18   
    19     19   /*
    20     20   ** Include the configuration header output by 'configure' if we're using the
    21     21   ** autoconf-based build
................................................................................
  2251   2251   void sqlite3EndTable(Parse*,Token*,Token*,Select*);
  2252   2252   
  2253   2253   Bitvec *sqlite3BitvecCreate(u32);
  2254   2254   int sqlite3BitvecTest(Bitvec*, u32);
  2255   2255   int sqlite3BitvecSet(Bitvec*, u32);
  2256   2256   void sqlite3BitvecClear(Bitvec*, u32);
  2257   2257   void sqlite3BitvecDestroy(Bitvec*);
  2258         -u32 sqlite3BitvecSize(Bitvec*);
  2259   2258   int sqlite3BitvecBuiltinTest(int,int*);
  2260   2259   
  2261   2260   RowSet *sqlite3RowSetInit(sqlite3*, void*, unsigned int);
  2262   2261   void sqlite3RowSetClear(RowSet*);
  2263   2262   void sqlite3RowSetInsert(RowSet*, i64);
  2264   2263   int sqlite3RowSetNext(RowSet*, i64*);
  2265   2264   

Changes to src/test2.c.

     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12     12   ** Code for testing the pager.c module in SQLite.  This code
    13     13   ** is not included in the SQLite library.  It is used for automated
    14     14   ** testing of the SQLite library.
    15     15   **
    16         -** $Id: test2.c,v 1.66 2009/01/16 15:21:06 danielk1977 Exp $
           16  +** $Id: test2.c,v 1.67 2009/01/16 16:23:38 danielk1977 Exp $
    17     17   */
    18     18   #include "sqliteInt.h"
    19     19   #include "tcl.h"
    20     20   #include <stdlib.h>
    21     21   #include <string.h>
    22     22   #include <ctype.h>
    23     23   
................................................................................
   414    414   static int page_unref(
   415    415     void *NotUsed,
   416    416     Tcl_Interp *interp,    /* The TCL interpreter that invoked this command */
   417    417     int argc,              /* Number of arguments */
   418    418     const char **argv      /* Text of each argument */
   419    419   ){
   420    420     DbPage *pPage;
          421  +  int rc;
   421    422     if( argc!=2 ){
   422    423       Tcl_AppendResult(interp, "wrong # args: should be \"", argv[0],
   423    424          " PAGE\"", 0);
   424    425       return TCL_ERROR;
   425    426     }
   426    427     pPage = (DbPage *)sqlite3TestTextToPtr(argv[1]);
   427         -  sqlite3PagerUnref(pPage);
          428  +  rc = sqlite3PagerUnref(pPage);
          429  +  if( rc!=SQLITE_OK ){
          430  +    Tcl_AppendResult(interp, errorName(rc), 0);
          431  +    return TCL_ERROR;
          432  +  }
   428    433     return TCL_OK;
   429    434   }
   430    435   
   431    436   /*
   432    437   ** Usage:   page_read PAGE
   433    438   **
   434    439   ** Return the content of a page

Changes to src/vdbeaux.c.

    10     10   **
    11     11   *************************************************************************
    12     12   ** This file contains code used for creating, destroying, and populating
    13     13   ** a VDBE (or an "sqlite3_stmt" as it is known to the outside world.)  Prior
    14     14   ** to version 2.8.7, all this code was combined into the vdbe.c source file.
    15     15   ** But that file was getting too big so this subroutines were split out.
    16     16   **
    17         -** $Id: vdbeaux.c,v 1.431 2009/01/16 15:21:06 danielk1977 Exp $
           17  +** $Id: vdbeaux.c,v 1.432 2009/01/16 16:23:38 danielk1977 Exp $
    18     18   */
    19     19   #include "sqliteInt.h"
    20     20   #include <ctype.h>
    21     21   #include "vdbeInt.h"
    22     22   
    23     23   
    24     24   
................................................................................
  1384   1384           }
  1385   1385         }
  1386   1386       }
  1387   1387   
  1388   1388       /* Sync the master journal file. If the IOCAP_SEQUENTIAL device
  1389   1389       ** flag is set this is not required.
  1390   1390       */
  1391         -    if( needSync 
  1392         -     && 0==(sqlite3OsDeviceCharacteristics(pMaster)&SQLITE_IOCAP_SEQUENTIAL)
  1393         -     && SQLITE_OK!=(rc = sqlite3OsSync(pMaster, SQLITE_SYNC_NORMAL))
  1394         -    ){
         1391  +    zMainFile = sqlite3BtreeGetDirname(db->aDb[0].pBt);
         1392  +    if( (needSync 
         1393  +     && (0==(sqlite3OsDeviceCharacteristics(pMaster)&SQLITE_IOCAP_SEQUENTIAL))
         1394  +     && (rc=sqlite3OsSync(pMaster, SQLITE_SYNC_NORMAL))!=SQLITE_OK) ){
  1395   1395         sqlite3OsCloseFree(pMaster);
  1396   1396         sqlite3OsDelete(pVfs, zMaster, 0);
  1397   1397         sqlite3DbFree(db, zMaster);
  1398   1398         return rc;
  1399   1399       }
  1400   1400   
  1401   1401       /* Sync all the db files involved in the transaction. The same call