Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Minor changes to lsmusr.wiki. Add the lsm_csr_cmp() function.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 9d39c3a354741d8ee76d639803960b052c7b59bb
User & Date: dan 2012-11-09 20:14:03.146
Context
2012-11-12
19:41
Updates to lsmusr.wiki. check-in: f7ef6cec1f user: dan tags: trunk
2012-11-09
20:14
Minor changes to lsmusr.wiki. Add the lsm_csr_cmp() function. check-in: 9d39c3a354 user: dan tags: trunk
2012-11-08
21:30
Add lsmusr.wiki. User documentation for lsm. check-in: c50bcdc37d user: dan tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to src/lsm.h.
19
20
21
22
23
24
25
26
27

28
29

30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#ifdef __cplusplus
extern "C" {
#endif

/*
** Opaque handle types.
*/
typedef struct lsm_db lsm_db;               /* Database connection handle */
typedef struct lsm_cursor lsm_cursor;       /* Database cursor handle */

typedef struct lsm_mutex lsm_mutex;         /* Mutex handle */
typedef struct lsm_file lsm_file;           /* OS file handle */


/* 64-bit integer type used for file offsets. */
typedef long long int lsm_i64;              /* 64-bit signed integer type */

/* Forward reference */
typedef struct lsm_env lsm_env;             /* Runtime environment */
typedef struct lsm_compress lsm_compress;   /* Compression library functions */

/* Candidate values for the 3rd argument to lsm_env.xLock() */
#define LSM_LOCK_UNLOCK 0
#define LSM_LOCK_SHARED 1
#define LSM_LOCK_EXCL   2

/*
** Run-time environment used by LSM







|

>
|

>




<
<
<
<







19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35




36
37
38
39
40
41
42
#ifdef __cplusplus
extern "C" {
#endif

/*
** Opaque handle types.
*/
typedef struct lsm_compress lsm_compress;   /* Compression library functions */
typedef struct lsm_cursor lsm_cursor;       /* Database cursor handle */
typedef struct lsm_db lsm_db;               /* Database connection handle */
typedef struct lsm_env lsm_env;             /* Runtime environment */
typedef struct lsm_file lsm_file;           /* OS file handle */
typedef struct lsm_mutex lsm_mutex;         /* Mutex handle */

/* 64-bit integer type used for file offsets. */
typedef long long int lsm_i64;              /* 64-bit signed integer type */





/* Candidate values for the 3rd argument to lsm_env.xLock() */
#define LSM_LOCK_UNLOCK 0
#define LSM_LOCK_SHARED 1
#define LSM_LOCK_EXCL   2

/*
** Run-time environment used by LSM
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
#define LSM_INFO_FREELIST         8
#define LSM_INFO_ARRAY_PAGES      9


/* 
** Open and close transactions and nested transactions.
**
** lsm_begin():
**   Used to open transactions and sub-transactions. A successful call to 
**   lsm_begin() ensures that there are at least iLevel nested transactions 
**   open. To open a top-level transaction, pass iLevel==1. To open a 
**   sub-transaction within the top-level transaction, iLevel==2. Passing 
**   iLevel==0 is a no-op.
**
** lsm_commit():
**   Used to commit transactions and sub-transactions. A successful call 
**   to lsm_commit() ensures that there are at most iLevel nested 
**   transactions open. To commit a top-level transaction, pass iLevel==0. 
**   To commit all sub-transactions inside the main transaction, pass
**   iLevel==1.
**
** lsm_rollback():
**   Used to roll back transactions and sub-transactions. A successful call 
**   to lsm_rollback() restores the database to the state it was in when
**   the iLevel'th nested sub-transaction (if any) was first opened. And then
**   closes transactions to ensure that there are at most iLevel nested
**   transactions open.
**
**   Passing iLevel==0 rolls back and closes the top-level transaction.







|






|






|







375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
#define LSM_INFO_FREELIST         8
#define LSM_INFO_ARRAY_PAGES      9


/* 
** Open and close transactions and nested transactions.
**
** lsm_begin()
**   Used to open transactions and sub-transactions. A successful call to 
**   lsm_begin() ensures that there are at least iLevel nested transactions 
**   open. To open a top-level transaction, pass iLevel==1. To open a 
**   sub-transaction within the top-level transaction, iLevel==2. Passing 
**   iLevel==0 is a no-op.
**
** lsm_commit()
**   Used to commit transactions and sub-transactions. A successful call 
**   to lsm_commit() ensures that there are at most iLevel nested 
**   transactions open. To commit a top-level transaction, pass iLevel==0. 
**   To commit all sub-transactions inside the main transaction, pass
**   iLevel==1.
**
** lsm_rollback()
**   Used to roll back transactions and sub-transactions. A successful call 
**   to lsm_rollback() restores the database to the state it was in when
**   the iLevel'th nested sub-transaction (if any) was first opened. And then
**   closes transactions to ensure that there are at most iLevel nested
**   transactions open.
**
**   Passing iLevel==0 rolls back and closes the top-level transaction.
578
579
580
581
582
583
584













585
586
587
588
/* 
** Retrieve data from a database cursor.
*/
int lsm_csr_valid(lsm_cursor *pCsr);
int lsm_csr_key(lsm_cursor *pCsr, const void **ppKey, int *pnKey);
int lsm_csr_value(lsm_cursor *pCsr, const void **ppVal, int *pnVal);














#ifdef __cplusplus
}  /* End of the 'extern "C"' block */
#endif
#endif /* ifndef _LSM_H */







>
>
>
>
>
>
>
>
>
>
>
>
>




576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
/* 
** Retrieve data from a database cursor.
*/
int lsm_csr_valid(lsm_cursor *pCsr);
int lsm_csr_key(lsm_cursor *pCsr, const void **ppKey, int *pnKey);
int lsm_csr_value(lsm_cursor *pCsr, const void **ppVal, int *pnVal);

/*
** If no error occurs, this function compares the database key passed via
** the pKey/nKey arguments with the key that the cursor passed as the first
** argument currently points to. If the cursors key is less than, equal to
** or greater than pKey/nKey, *piRes is set to less than, equal to or greater
** than zero before returning. LSM_OK is returned in this case.
**
** Or, if an error occurs, an LSM error code is returned and the final 
** value of *piRes is undefined. If the cursor does not point to a valid
** key when this function is called, LSM_MISUSE is returned.
*/
int lsm_csr_cmp(lsm_cursor *pCsr, const void *pKey, int nKey, int *piRes);

#ifdef __cplusplus
}  /* End of the 'extern "C"' block */
#endif
#endif /* ifndef _LSM_H */
Changes to src/lsm_sorted.c.
2982
2983
2984
2985
2986
2987
2988
















2989
2990
2991
2992
2993
2994
2995
        *ppKey = pCsr->key.pData;
      }
      *pnKey = nKey; 
    }
  }
  return LSM_OK;
}

















int lsmMCursorValue(MultiCursor *pCsr, void **ppVal, int *pnVal){
  void *pVal;
  int nVal;
  int rc;
  if( (pCsr->flags & CURSOR_SEEK_EQ) || pCsr->aTree==0 ){
    rc = LSM_OK;







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







2982
2983
2984
2985
2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
        *ppKey = pCsr->key.pData;
      }
      *pnKey = nKey; 
    }
  }
  return LSM_OK;
}

/*
** Compare the current key that cursor csr points to with pKey/nKey. Set
** *piRes to the result and return LSM_OK.
*/
int lsm_csr_cmp(lsm_cursor *csr, const void *pKey, int nKey, int *piRes){
  MultiCursor *pCsr = (MultiCursor *)csr;
  void *pCsrkey; int nCsrkey;
  int rc;
  rc = lsmMCursorKey(pCsr, &pCsrkey, &nCsrkey);
  if( rc==LSM_OK ){
    int (*xCmp)(void *, int, void *, int) = pCsr->pDb->xCmp;
    *piRes = sortedKeyCompare(xCmp, 0, pCsrkey, nCsrkey, 0, (void *)pKey, nKey);
  }
  return rc;
}

int lsmMCursorValue(MultiCursor *pCsr, void **ppVal, int *pnVal){
  void *pVal;
  int nVal;
  int rc;
  if( (pCsr->flags & CURSOR_SEEK_EQ) || pCsr->aTree==0 ){
    rc = LSM_OK;
Changes to www/lsmusr.wiki.
200
201
202
203
204
205
206



207
208
209
210
211
212
213

<verbatim>
  int iSafety = -1;
  lsm_config(db, LSM_CONFIG_SAFETY, &iSafety);
  /* At this point, variable iSafety is set to the currently configured value
  ** of the LSM_CONFIG_SAFETY parameter (either 0, 1 or 2).  */
</verbatim>




<h2>2.5 Database Transactions and MVCC </h2>

<p>LSM supports a single-writer/multiple-reader 
<a href=http://en.wikipedia.org/wiki/Multiversion_concurrency_control>MVCC</a>
based transactional concurrency model. This is the same model that SQLite
supports in <a href="http://www.sqlite.org/wal.html">WAL mode</a>.







>
>
>







200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216

<verbatim>
  int iSafety = -1;
  lsm_config(db, LSM_CONFIG_SAFETY, &iSafety);
  /* At this point, variable iSafety is set to the currently configured value
  ** of the LSM_CONFIG_SAFETY parameter (either 0, 1 or 2).  */
</verbatim>

<p>The lsm_config() function may also be used to configure other database
connection parameters.  

<h2>2.5 Database Transactions and MVCC </h2>

<p>LSM supports a single-writer/multiple-reader 
<a href=http://en.wikipedia.org/wiki/Multiversion_concurrency_control>MVCC</a>
based transactional concurrency model. This is the same model that SQLite
supports in <a href="http://www.sqlite.org/wal.html">WAL mode</a>.
433
434
435
436
437
438
439






440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462






463



464























465


















466
467
468
469
470
471
<a href=#robustness>LSM_CONFIG_SAFETY</a> parameter, it may be necessary to
ensure that all segments referenced by the snapshot have been synced to disk
(safely stored on the persistent media such that they will not be lost if a
power failure occurs) before doing so. It is not necessary for every version
of the in-memory snapshot to be checkpointed. The in-memory snapshot may be
modified multiple times between checkpoints.







<p> Regular database checkpoints are required to ensure that unused space
within the log file and database file can be reused in a timely fashion.
Specifically:

<ul>
  <li> <p>Space within the log file cannot be recycled until the corresponding
       data has been written into a database segment and a checkpoint 
       performed.

  <li> <p>When two or more existing segments are merged within the database
       file, database clients may start using the new, larger, segment
       immediately.  However the space occupied by the original segments may
       not be reused until after a snapshot that refers to the new segment, and
       not the old ones, has been checkpointed.
</ul>

In other words, without checkpoints the system will function, but both the
log and database files will grow indefinitely as the database is modified 
(even if the size of the dataset remains constant). Additionally, if a crash
or power failure occurs, the next client to open the database file has to
process all data written to the log file since the most recent checkpoint. If
checkpoints are performed infrequently, this can be a time consuming exercise.



































<h2>3.2 Work and Checkpoint Scheduling </h2>



















<h2>3.3 Other Parameters </h2>

<h1>4. Compressed and Encrypted Databases </h1>









>
>
>
>
>
>
















|






>
>
>
>
>
>

>
>
>

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>






436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
<a href=#robustness>LSM_CONFIG_SAFETY</a> parameter, it may be necessary to
ensure that all segments referenced by the snapshot have been synced to disk
(safely stored on the persistent media such that they will not be lost if a
power failure occurs) before doing so. It is not necessary for every version
of the in-memory snapshot to be checkpointed. The in-memory snapshot may be
modified multiple times between checkpoints.

<p>
Because a checkpointer process is often required to sync the database file
before updating the database header, "checkpointing" often appears to be the
costliest part of transfering data to the database file, at least in terms of
wall-clock time.

<p> Regular database checkpoints are required to ensure that unused space
within the log file and database file can be reused in a timely fashion.
Specifically:

<ul>
  <li> <p>Space within the log file cannot be recycled until the corresponding
       data has been written into a database segment and a checkpoint 
       performed.

  <li> <p>When two or more existing segments are merged within the database
       file, database clients may start using the new, larger, segment
       immediately.  However the space occupied by the original segments may
       not be reused until after a snapshot that refers to the new segment, and
       not the old ones, has been checkpointed.
</ul>

<p>In other words, without checkpoints the system will function, but both the
log and database files will grow indefinitely as the database is modified 
(even if the size of the dataset remains constant). Additionally, if a crash
or power failure occurs, the next client to open the database file has to
process all data written to the log file since the most recent checkpoint. If
checkpoints are performed infrequently, this can be a time consuming exercise.

<p>In order to safely write data into the in-memory tree (by calling 
lsm_insert, lsm_delete or lsm_delete_range), the database client must hold
the database WRITER lock. At most one client may concurrently hold the WRITER 
lock. Holding the WRITER lock is synonymous with having an open write
transaction - the client obtains the WRITER lock when the transaction is 
opened and relinquishes it when the transaction is closed.

<p>As well as the WRITER lock, there are two other locks that may be held by
at most one client at any time - the WORKER and CHECKPOINTER locks. The roles
of the three locks are roughly as follows:

<table valign=top>
<tr><td valign=top>WRITER<td style="width:3ex"><td>
The WRITER lock is required to modify the contents of the in-memory tree.
Including marking an in-memory tree as "old" and starting a new live tree.
It is also required to write to the log file.

<tr><td valign=top>WORKER<td><td>
The WORKER lock is required to write to segments within the database file.
Either when merging two or more existing segments within the database, or
when flushing an in-memory tree to disk to create a new segment.
The WORKER lock is also required to update the database snapshot stored in
main memory (updated so that new clients will use the new segments the worker
creates).

<tr><td valign=top>CHECKPOINTER<td><td>
The CHECKPOINTER lock is required to update the snapshot stored in the 
database file header (to checkpoint the database). 
</table>

<p>The tasks associated with each of the locks above may be performed
concurrently by multiple database connections, located either in the same
application process or different processes.

<h2>3.2 Work and Checkpoint Scheduling </h2>

<p>The section above describes the three stages of transfering data written
to the database from the application to persistent storage. A "writer" 
client writes the data into the in-memory tree and log file. Later on a 
"worker" client flushes the data from the in-memory tree to a new segment
in the the database file. Additionally, a worker client must periodically
merge existing database segments together to prevent them from growing too
numerous.

<ul>
<li>lsm_config_work_hook() -> Change to lsm_config() option.
<li>lsm_tree_size()        -> Change to lsm_info() option.
<li>lsm_ckpt_size()        -> Change to lsm_info() option.

<li>lsm_work()
<li>lsm_checkpoint()
<li>lsm_flush()
</ul>

<h2>3.3 Other Parameters </h2>

<h1>4. Compressed and Encrypted Databases </h1>