SQLite4
Check-in [d6bd08ca0e]
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix LSM single-process mode so that it holds an exclusive lock on the database file - preventing connections from within external processes.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: d6bd08ca0eb731d9ac0a0e3b573947bf3dba3673
User & Date: dan 2013-02-02 16:45:05
Context
2013-02-05
09:52
Add test file lsm3.test, which should have been added a few days ago. check-in: 5dfd8651df user: dan tags: trunk
2013-02-04
19:04
Map and unmap parts of the database file on an LRU basis to limit the amount of address space consumed at any one time (for 32-bit address spaces). It looks like this might be slower than read() and write() anyway... check-in: d1b1a9e969 user: dan tags: mmap-on-demand
2013-02-02
16:45
Fix LSM single-process mode so that it holds an exclusive lock on the database file - preventing connections from within external processes. check-in: d6bd08ca0e user: dan tags: trunk
2013-02-01
19:49
Simplifications and clarifications to lsmusr.wiki. check-in: 33eca2e1f4 user: dan tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to src/kvlsm.c.

   449    449     if( pNew==0 ){
   450    450       rc = SQLITE4_NOMEM;
   451    451     }else{
   452    452       struct Config {
   453    453         const char *zParam;
   454    454         int eParam;
   455    455       } aConfig[] = {
   456         -      { "lsm_block_size", LSM_CONFIG_BLOCK_SIZE }
          456  +      { "lsm_block_size", LSM_CONFIG_BLOCK_SIZE },
          457  +      { "lsm_multiple_processes", LSM_CONFIG_MULTIPLE_PROCESSES }
   457    458       };
   458    459   
   459    460       memset(pNew, 0, sizeof(KVLsm));
   460    461       pNew->base.pStoreVfunc = &kvlsmMethods;
   461    462       pNew->base.pEnv = pEnv;
   462    463       rc = lsm_new(0, &pNew->pDb);
   463    464       if( rc==SQLITE4_OK ){

Changes to src/lsm_shared.c.

    30     30   
    31     31   /*
    32     32   ** Database structure. There is one such structure for each distinct 
    33     33   ** database accessed by this process. They are stored in the singly linked 
    34     34   ** list starting at global variable gShared.pDatabase. Database objects are 
    35     35   ** reference counted. Once the number of connections to the associated
    36     36   ** database drops to zero, they are removed from the linked list and deleted.
           37  +**
           38  +** pFile:
           39  +**   In multi-process mode, this file descriptor is used to obtain locks 
           40  +**   and to access shared-memory. In single process mode, its only job is
           41  +**   to hold the exclusive lock on the file.
           42  +**   
    37     43   */
    38     44   struct Database {
    39     45     /* Protected by the global mutex (enterGlobalMutex/leaveGlobalMutex): */
    40     46     char *zName;                    /* Canonical path to database file */
    41     47     int nName;                      /* strlen(zName) */
    42     48     int nDbRef;                     /* Number of associated lsm_db handles */
    43     49     Database *pDbNext;              /* Next Database structure in global list */
    44     50   
    45     51     /* Protected by the local mutex (pClientMutex) */
           52  +  int bMultiProc;                 /* True if running in multi-process mode */
    46     53     lsm_file *pFile;                /* Used for locks/shm in multi-proc mode */
    47     54     LsmFile *pLsmFile;              /* List of deferred closes */
    48     55     lsm_mutex *pClientMutex;        /* Protects the apShmChunk[] and pConn */
    49     56     int nShmChunk;                  /* Number of entries in apShmChunk[] array */
    50     57     void **apShmChunk;              /* Array of "shared" memory regions */
    51     58     lsm_db *pConn;                  /* List of connections to this db. */
    52     59   };
................................................................................
   265    272   
   266    273         /* If the checkpoint was written successfully, delete the log file
   267    274         ** and, if possible, truncate the database file.  */
   268    275         if( rc==LSM_OK ){
   269    276           Database *p = pDb->pDatabase;
   270    277           dbTruncateFile(pDb);
   271    278           lsmFsCloseAndDeleteLog(pDb->pFS);
   272         -        if( p->pFile ) lsmEnvShmUnmap(pDb->pEnv, p->pFile, 1);
          279  +        if( p->pFile && p->bMultiProc ) lsmEnvShmUnmap(pDb->pEnv, p->pFile, 1);
   273    280         }
   274    281       }
   275    282     }
   276    283   
   277    284     lsmShmLock(pDb, LSM_LOCK_DMS2, LSM_LOCK_UNLOCK, 0);
   278    285     lsmShmLock(pDb, LSM_LOCK_DMS1, LSM_LOCK_UNLOCK, 0);
   279    286     pDb->pShmhdr = 0;
................................................................................
   314    321       if( rc==LSM_OK ){
   315    322         rc = lsmLogRecover(pDb);
   316    323       }
   317    324     }else if( rc==LSM_BUSY ){
   318    325       rc = LSM_OK;
   319    326     }
   320    327   
   321         -  /* Take a shared lock on DMS2. This lock "cannot" fail, as connections 
   322         -  ** may only hold an exclusive lock on DMS2 if they first hold an exclusive
   323         -  ** lock on DMS1. And this connection is currently holding the exclusive
   324         -  ** lock on DSM1.  */
          328  +  /* Take a shared lock on DMS2. In multi-process mode this lock "cannot" 
          329  +  ** fail, as connections may only hold an exclusive lock on DMS2 if they 
          330  +  ** first hold an exclusive lock on DMS1. And this connection is currently 
          331  +  ** holding the exclusive lock on DSM1. 
          332  +  **
          333  +  ** However, if some other connection has the database open in single-process
          334  +  ** mode, this operation will fail. In this case, return the error to the
          335  +  ** caller - the attempt to connect to the db has failed.
          336  +  */
   325    337     if( rc==LSM_OK ){
   326    338       rc = lsmShmLock(pDb, LSM_LOCK_DMS2, LSM_LOCK_SHARED, 0);
   327         -    assert( rc!=LSM_BUSY );
   328    339     }
   329    340   
   330    341     /* If anything went wrong, unlock DMS2. Unlock DMS1 in any case. */
   331    342     if( rc!=LSM_OK ){
   332    343       lsmShmLock(pDb, LSM_LOCK_DMS2, LSM_LOCK_UNLOCK, 0);
   333    344       pDb->pShmhdr = 0;
   334    345     }
................................................................................
   359    370     int nName = lsmStrlen(zName);
   360    371   
   361    372     assert( pDb->pDatabase==0 );
   362    373     rc = enterGlobalMutex(pEnv);
   363    374     if( rc==LSM_OK ){
   364    375   
   365    376       /* Search the global list for an existing object. TODO: Need something
   366         -    ** better than the strcmp() below to figure out if a given Database
          377  +    ** better than the memcmp() below to figure out if a given Database
   367    378       ** object represents the requested file.  */
   368    379       for(p=gShared.pDatabase; p; p=p->pDbNext){
   369    380         if( nName==p->nName && 0==memcmp(zName, p->zName, nName) ) break;
   370    381       }
   371    382   
   372    383       /* If no suitable Database object was found, allocate a new one. */
   373    384       if( p==0 ){
   374    385         p = (Database *)lsmMallocZeroRc(pEnv, sizeof(Database)+nName+1, &rc);
   375    386   
   376    387         /* If the allocation was successful, fill in other fields and
   377    388         ** allocate the client mutex. */ 
   378    389         if( rc==LSM_OK ){
          390  +        p->bMultiProc = pDb->bMultiProc;
   379    391           p->zName = (char *)&p[1];
   380    392           p->nName = nName;
   381    393           memcpy((void *)p->zName, zName, nName+1);
   382    394           rc = lsmMutexNew(pEnv, &p->pClientMutex);
   383    395         }
   384    396   
   385         -      /* If running in multi-process mode and nothing has gone wrong so far,
   386         -      ** open the shared fd */
   387         -      if( rc==LSM_OK && pDb->bMultiProc ){
          397  +      /* If nothing has gone wrong so far, open the shared fd. And if that
          398  +      ** succeeds and this connection requested single-process mode, 
          399  +      ** attempt to take the exclusive lock on DMS2.  */
          400  +      if( rc==LSM_OK ){
   388    401           rc = lsmEnvOpen(pDb->pEnv, p->zName, &p->pFile);
   389    402         }
          403  +      if( rc==LSM_OK && p->bMultiProc==0 ){
          404  +        rc = lsmEnvLock(pDb->pEnv, p->pFile, LSM_LOCK_DMS2, LSM_LOCK_EXCL);
          405  +      }
   390    406   
   391    407         if( rc==LSM_OK ){
   392    408           p->pDbNext = gShared.pDatabase;
   393    409           gShared.pDatabase = p;
   394    410         }else{
   395    411           freeDatabase(pEnv, p);
   396    412           p = 0;
................................................................................
   456    472       if( pDb->pShmhdr ){
   457    473         doDbDisconnect(pDb);
   458    474       }
   459    475   
   460    476       lsmMutexEnter(pDb->pEnv, p->pClientMutex);
   461    477       for(ppDb=&p->pConn; *ppDb!=pDb; ppDb=&((*ppDb)->pNext));
   462    478       *ppDb = pDb->pNext;
   463         -    if( lsmDbMultiProc(pDb) ){
   464         -      dbDeferClose(pDb);
   465         -    }
          479  +    dbDeferClose(pDb);
   466    480       lsmMutexLeave(pDb->pEnv, p->pClientMutex);
   467    481   
   468    482       enterGlobalMutex(pDb->pEnv);
   469    483       p->nDbRef--;
   470    484       if( p->nDbRef==0 ){
          485  +      LsmFile *pIter;
          486  +      LsmFile *pNext;
   471    487         Database **pp;
   472    488   
   473    489         /* Remove the Database structure from the linked list. */
   474    490         for(pp=&gShared.pDatabase; *pp!=p; pp=&((*pp)->pDbNext));
   475    491         *pp = p->pDbNext;
   476    492   
   477         -      /* Free the Database object and shared memory buffers. */
   478         -      if( p->pFile==0 ){
          493  +      /* If they were allocated from the heap, free the shared memory chunks */
          494  +      if( p->bMultiProc==0 ){
   479    495           int i;
   480    496           for(i=0; i<p->nShmChunk; i++){
   481    497             lsmFree(pDb->pEnv, p->apShmChunk[i]);
   482    498           }
   483         -      }else{
   484         -        LsmFile *pIter;
   485         -        LsmFile *pNext;
   486         -        for(pIter=p->pLsmFile; pIter; pIter=pNext){
   487         -          pNext = pIter->pNext;
   488         -          lsmEnvClose(pDb->pEnv, pIter->pFile);
   489         -          lsmFree(pDb->pEnv, pIter);
   490         -        }
          499  +      }
          500  +
          501  +      /* Close any outstanding file descriptors */
          502  +      for(pIter=p->pLsmFile; pIter; pIter=pNext){
          503  +        pNext = pIter->pNext;
          504  +        lsmEnvClose(pDb->pEnv, pIter->pFile);
          505  +        lsmFree(pDb->pEnv, pIter);
   491    506         }
   492    507         freeDatabase(pDb->pEnv, p);
   493    508       }
   494    509       leaveGlobalMutex(pDb->pEnv);
   495    510     }
   496    511   }
   497    512   
................................................................................
  1291   1306   
  1292   1307   /*
  1293   1308   ** This function may only be called after a successful call to
  1294   1309   ** lsmDbDatabaseConnect(). It returns true if the connection is in
  1295   1310   ** multi-process mode, or false otherwise.
  1296   1311   */
  1297   1312   int lsmDbMultiProc(lsm_db *pDb){
  1298         -  return pDb->pDatabase && (pDb->pDatabase->pFile!=0);
         1313  +  return pDb->pDatabase && pDb->pDatabase->bMultiProc;
  1299   1314   }
  1300   1315   
  1301   1316   
  1302   1317   /*************************************************************************
  1303   1318   **************************************************************************
  1304   1319   **************************************************************************
  1305   1320   **************************************************************************
................................................................................
  1349   1364         }
  1350   1365         p->apShmChunk = apShm;
  1351   1366       }
  1352   1367   
  1353   1368       for(i=db->nShm; rc==LSM_OK && i<nChunk; i++){
  1354   1369         if( i>=p->nShmChunk ){
  1355   1370           void *pChunk = 0;
  1356         -        if( p->pFile==0 ){
         1371  +        if( p->bMultiProc==0 ){
  1357   1372             /* Single process mode */
  1358   1373             pChunk = lsmMallocZeroRc(pEnv, LSM_SHM_CHUNK_SIZE, &rc);
  1359   1374           }else{
  1360   1375             /* Multi-process mode */
  1361   1376             rc = lsmEnvShmMap(pEnv, p->pFile, i, LSM_SHM_CHUNK_SIZE, &pChunk);
  1362   1377           }
  1363   1378           if( rc==LSM_OK ){
................................................................................
  1373   1388   
  1374   1389       /* Release the client mutex */
  1375   1390       lsmMutexLeave(pEnv, p->pClientMutex);
  1376   1391     }
  1377   1392   
  1378   1393     return rc;
  1379   1394   }
         1395  +
         1396  +static int lockSharedFile(lsm_env *pEnv, Database *p, int iLock, int eOp){
         1397  +  int rc = LSM_OK;
         1398  +  if( p->bMultiProc ){
         1399  +    rc = lsmEnvLock(pEnv, p->pFile, iLock, eOp);
         1400  +  }
         1401  +  return rc;
         1402  +}
  1380   1403   
  1381   1404   /*
  1382   1405   ** Attempt to obtain the lock identified by the iLock and bExcl parameters.
  1383   1406   ** If successful, return LSM_OK. If the lock cannot be obtained because 
  1384   1407   ** there exists some other conflicting lock, return LSM_BUSY. If some other
  1385   1408   ** error occurs, return an LSM error code.
  1386   1409   **
................................................................................
  1427   1450       assert( nExcl==0 || nExcl==1 );
  1428   1451       assert( nExcl==0 || nShared==0 );
  1429   1452       assert( nExcl==0 || (db->mLock & (me|ms))==0 );
  1430   1453   
  1431   1454       switch( eOp ){
  1432   1455         case LSM_LOCK_UNLOCK:
  1433   1456           if( nShared==0 ){
  1434         -          lsmEnvLock(db->pEnv, p->pFile, iLock, LSM_LOCK_UNLOCK);
         1457  +          lockSharedFile(db->pEnv, p, iLock, LSM_LOCK_UNLOCK);
  1435   1458           }
  1436   1459           db->mLock &= ~(me|ms);
  1437   1460           break;
  1438   1461   
  1439   1462         case LSM_LOCK_SHARED:
  1440   1463           if( nExcl ){
  1441   1464             rc = LSM_BUSY;
  1442   1465           }else{
  1443   1466             if( nShared==0 ){
  1444         -            rc = lsmEnvLock(db->pEnv, p->pFile, iLock, LSM_LOCK_SHARED);
         1467  +            rc = lockSharedFile(db->pEnv, p, iLock, LSM_LOCK_SHARED);
  1445   1468             }
  1446   1469             db->mLock |= ms;
  1447   1470             db->mLock &= ~me;
  1448   1471           }
  1449   1472           break;
  1450   1473   
  1451   1474         default:
  1452   1475           assert( eOp==LSM_LOCK_EXCL );
  1453   1476           if( nExcl || nShared ){
  1454   1477             rc = LSM_BUSY;
  1455   1478           }else{
  1456         -          rc = lsmEnvLock(db->pEnv, p->pFile, iLock, LSM_LOCK_EXCL);
         1479  +          rc = lockSharedFile(db->pEnv, p, iLock, LSM_LOCK_EXCL);
  1457   1480             db->mLock |= (me|ms);
  1458   1481           }
  1459   1482           break;
  1460   1483       }
  1461   1484   
  1462   1485       lsmMutexLeave(db->pEnv, p->pClientMutex);
  1463   1486     }

Changes to www/lsmusr.wiki.

   802    802   
   803    803     <li> <p>
   804    804          Once sufficient data has been accumulated in an in-memory tree (by
   805    805          default "sufficient data" means 1MB, including data structure 
   806    806          overhead), it is marked as "old" and a new "live" in-memory tree 
   807    807          created. An old in-memory tree is immutable - new data is always
   808    808          inserted into the live tree. There may be at most one old tree
   809         -       in memory at any time.
          809  +       in memory at a time.
   810    810   
   811    811     <li> <p>
   812    812          The contents of an old in-memory tree may be written into the 
   813    813          database file at any point. Once its contents have been written (or
   814    814          "flushed") to the database file, the in-memory tree may be discarded.
   815    815          Flushing an in-memory tree to the database file creates a new database
   816    816          "segment". A database segment is an immutable b-tree structure stored
   817    817          within the database file. A single database file may contain up to 64 
   818    818          segments.
   819    819   
   820    820     <li> <p>
   821         -       At any point, two or more existing segments within the database may
   822         -       be merged together into a single segment. Once their contents has
          821  +       At any point, two or more existing segments within the database file
          822  +       may be merged together into a single segment. Once their contents has
   823    823          been merged into the new segment, the original segments may be 
   824    824          discarded.
   825    825   
   826    826     <li> <p>
   827    827          After the set of segments in a database file has been modified (either
   828    828          by flushing an in-memory tree to disk or by merging existing segments
   829    829          together), the changes may be made persistent by "checkpointing" the 
   830         -       database. Checkpointing involves syncing the contents of the database
   831         -       file to disk and updating the database file header.
          830  +       database. Checkpointing involves updating the database file header and
          831  +       and (usually) syncing the contents of the database file to disk.
   832    832   </ol>
   833    833   
   834    834   <p>Steps 3 and 4 above are known as "working" on the database. Step 5 is
   835    835   refered to as "checkpointing". By default, database connections perform work
   836    836   and checkpoint operations periodically from within calls to API functions
   837    837   <code>lsm_insert</code>, <code>lsm_delete</code>, <code>lsm_delete_range</code>
   838    838   and <code>lsm_commit</code> (i.e. functions that write to the database).
................................................................................
   994    994       closing read and write transactions. 
   995    995   
   996    996       <p>This option can only be set before lsm_open() is called on the database
   997    997       connection.
   998    998   
   999    999       <p>If this option is set to false and there is already a connection to the
  1000   1000       database from another process when lsm_open() is called, the lsm_open()
  1001         -    call fails with error code LSM_BUSY. <span style=color:red>todo: It 
  1002         -    doesn't actually do this yet. But it should...</span>
         1001  +    call fails with error code LSM_BUSY.
  1003   1002   
  1004   1003     <dt> <a href=lsmapi.wiki#LSM_CONFIG_SAFETY>LSM_CONFIG_SAFETY</a>
  1005   1004     <dd> <p style=margin-top:0>
  1006   1005       The effect of this option on <a href=#data_durability>data durability</a>
  1007   1006       is described above.
  1008   1007   
  1009   1008       <p>From a performance point of view, this option determines how often the
................................................................................
  1299   1298   the nMerge argument set to 1 and the third parameter set to a negative value
  1300   1299   (interpreted as - keep working until there is no more work to do). For 
  1301   1300   example:
  1302   1301   
  1303   1302   <verbatim>
  1304   1303     rc = lsm_work(db, 1, -1, 0);
  1305   1304   </verbatim>
         1305  +
         1306  +<p><span style=color:red>todo: the -1 as the 3rd argument above is currently
         1307  +not supported</span>
  1306   1308   
  1307   1309   <p>When optimizing the database as above, either the LSM_CONFIG_AUTOCHECKPOINT
  1308   1310   parameter should be set to a non-zero value or lsm_checkpoint() should be
  1309   1311   called periodically. Otherwise, no checkpoints will be performed, preventing
  1310   1312   the library from reusing any space occupied by old segments even after their
  1311   1313   content has been merged into the new segment. The result - a database file that
  1312   1314   is optimized, except that it is up to twice as large as it otherwise would be.
  1313   1315   
  1314   1316   
  1315   1317   
  1316   1318