Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Improvements to lsmusr.wiki.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: e47b5e3ae69139d5ec261348717d52e583846b7e
User & Date: dan 2012-11-14 18:23:17.784
Context
2012-11-14
20:09
Updates to lsmusr.wiki. check-in: 1ea9187820 user: dan tags: trunk
18:23
Improvements to lsmusr.wiki. check-in: e47b5e3ae6 user: dan tags: trunk
2012-11-13
20:16
Further documentation updates. check-in: 414ed6da4e user: dan tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to src/lsm.h.
35
36
37
38
39
40
41


42
43
44
45
46
47
48

/* Candidate values for the 3rd argument to lsm_env.xLock() */
#define LSM_LOCK_UNLOCK 0
#define LSM_LOCK_SHARED 1
#define LSM_LOCK_EXCL   2

/*


** Run-time environment used by LSM
*/
struct lsm_env {
  int nByte;                 /* Size of this structure in bytes */
  int iVersion;              /* Version number of this structure (1) */
  /****** file i/o ***********************************************/
  void *pVfsCtx;







>
>







35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

/* Candidate values for the 3rd argument to lsm_env.xLock() */
#define LSM_LOCK_UNLOCK 0
#define LSM_LOCK_SHARED 1
#define LSM_LOCK_EXCL   2

/*
** CAPI: Database Runtime Environment
**
** Run-time environment used by LSM
*/
struct lsm_env {
  int nByte;                 /* Size of this structure in bytes */
  int iVersion;              /* Version number of this structure (1) */
  /****** file i/o ***********************************************/
  void *pVfsCtx;
Changes to www/lsmapi.wiki.
13
14
15
16
17
18
19

20
21
22
23
24
25
26

<p>
This page contains the LSM API Reference Manual. It is intended to complement
the <a href=lsmusr.wiki>LSM User Manual</a>.

<h1>LSM API Topics</h1>
<ol>

<li><a href="#lsm" style=text-decoration:none>LSM Error Codes</a>
<li><a href="#creating" style=text-decoration:none>Creating and Destroying Database Connection Handles</a>
<li><a href="#connecting" style=text-decoration:none>Connecting to a Database</a>
<li><a href="#obtaining" style=text-decoration:none>Obtaining pointers to databases environments</a>
<li><a href="#configuring" style=text-decoration:none>Configuring a database connection.</a>
<li><a href="#compression" style=text-decoration:none>Compression and/or Encryption Hooks</a>
<li><a href="#allocating" style=text-decoration:none>Allocating and Freeing Memory</a>







>







13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

<p>
This page contains the LSM API Reference Manual. It is intended to complement
the <a href=lsmusr.wiki>LSM User Manual</a>.

<h1>LSM API Topics</h1>
<ol>
<li><a href="#database" style=text-decoration:none>Database Runtime Environment</a>
<li><a href="#lsm" style=text-decoration:none>LSM Error Codes</a>
<li><a href="#creating" style=text-decoration:none>Creating and Destroying Database Connection Handles</a>
<li><a href="#connecting" style=text-decoration:none>Connecting to a Database</a>
<li><a href="#obtaining" style=text-decoration:none>Obtaining pointers to databases environments</a>
<li><a href="#configuring" style=text-decoration:none>Configuring a database connection.</a>
<li><a href="#compression" style=text-decoration:none>Compression and/or Encryption Hooks</a>
<li><a href="#allocating" style=text-decoration:none>Allocating and Freeing Memory</a>
64
65
66
67
68
69
70

71
72
73
74
75
76
77
<span style=display:block;float:left;width:35ex><a href=#lsm_open>lsm_open</a></span>
<span style=display:block;float:left;width:35ex><a href=#lsm_rollback>lsm_rollback</a></span>
<span style=display:block;float:left;width:35ex><a href=#lsm_tree_size>lsm_tree_size</a></span>
<span style=display:block;float:left;width:35ex><a href=#lsm_work>lsm_work</a></span>
<br style=clear:both>
<h1 style=clear:both>All LSM API Types</h1>
<span style=display:block;float:left;width:35ex><a href=#lsm_compress>lsm_compress</a></span>

<br style=clear:both>
<h1>All LSM API Constants</h1>
<span style=display:block;float:left;width:35ex><a href=#LSM_BUSY>LSM_BUSY</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_CANTOPEN>LSM_CANTOPEN</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_CONFIG_AUTOCHECKPOINT>LSM_CONFIG_AUTOCHECKPOINT</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_CONFIG_AUTOWORK>LSM_CONFIG_AUTOWORK</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_CONFIG_BLOCK_SIZE>LSM_CONFIG_BLOCK_SIZE</a></span>







>







65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
<span style=display:block;float:left;width:35ex><a href=#lsm_open>lsm_open</a></span>
<span style=display:block;float:left;width:35ex><a href=#lsm_rollback>lsm_rollback</a></span>
<span style=display:block;float:left;width:35ex><a href=#lsm_tree_size>lsm_tree_size</a></span>
<span style=display:block;float:left;width:35ex><a href=#lsm_work>lsm_work</a></span>
<br style=clear:both>
<h1 style=clear:both>All LSM API Types</h1>
<span style=display:block;float:left;width:35ex><a href=#lsm_compress>lsm_compress</a></span>
<span style=display:block;float:left;width:35ex><a href=#lsm_env>lsm_env</a></span>
<br style=clear:both>
<h1>All LSM API Constants</h1>
<span style=display:block;float:left;width:35ex><a href=#LSM_BUSY>LSM_BUSY</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_CANTOPEN>LSM_CANTOPEN</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_CONFIG_AUTOCHECKPOINT>LSM_CONFIG_AUTOCHECKPOINT</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_CONFIG_AUTOWORK>LSM_CONFIG_AUTOWORK</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_CONFIG_BLOCK_SIZE>LSM_CONFIG_BLOCK_SIZE</a></span>
96
97
98
99
100
101
102


103
104
105
106
107
108
109
110
111
112
113















































114
115
116
117
118
119
120
<span style=display:block;float:left;width:35ex><a href=#LSM_INFO_LOG_STRUCTURE>LSM_INFO_LOG_STRUCTURE</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_INFO_NREAD>LSM_INFO_NREAD</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_INFO_NWRITE>LSM_INFO_NWRITE</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_INFO_PAGE_ASCII_DUMP>LSM_INFO_PAGE_ASCII_DUMP</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_INFO_PAGE_HEX_DUMP>LSM_INFO_PAGE_HEX_DUMP</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_IOERR>LSM_IOERR</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_MISUSE>LSM_MISUSE</a></span>


<span style=display:block;float:left;width:35ex><a href=#LSM_NOMEM>LSM_NOMEM</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_OK>LSM_OK</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_PROTOCOL>LSM_PROTOCOL</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SAFETY_FULL>LSM_SAFETY_FULL</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SAFETY_NORMAL>LSM_SAFETY_NORMAL</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SAFETY_OFF>LSM_SAFETY_OFF</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SEEK_EQ>LSM_SEEK_EQ</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SEEK_GE>LSM_SEEK_GE</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SEEK_LE>LSM_SEEK_LE</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SEEK_LEFAST>LSM_SEEK_LEFAST</a></span>
<br style=clear:both>















































<h2 id=lsm>LSM Error Codes<a id=LSM_OK></a><a id=LSM_ERROR></a><a id=LSM_BUSY></a><a id=LSM_NOMEM></a><a id=LSM_IOERR></a><a id=LSM_CORRUPT></a><a id=LSM_FULL></a><a id=LSM_CANTOPEN></a><a id=LSM_PROTOCOL></a><a id=LSM_MISUSE></a></h2>
<verbatim>#define LSM_OK         0
#define LSM_ERROR      1
#define LSM_BUSY       5
#define LSM_NOMEM      7
#define LSM_IOERR     10
#define LSM_CORRUPT   11







>
>











>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
<span style=display:block;float:left;width:35ex><a href=#LSM_INFO_LOG_STRUCTURE>LSM_INFO_LOG_STRUCTURE</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_INFO_NREAD>LSM_INFO_NREAD</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_INFO_NWRITE>LSM_INFO_NWRITE</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_INFO_PAGE_ASCII_DUMP>LSM_INFO_PAGE_ASCII_DUMP</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_INFO_PAGE_HEX_DUMP>LSM_INFO_PAGE_HEX_DUMP</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_IOERR>LSM_IOERR</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_MISUSE>LSM_MISUSE</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_MUTEX_GLOBAL>LSM_MUTEX_GLOBAL</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_MUTEX_HEAP>LSM_MUTEX_HEAP</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_NOMEM>LSM_NOMEM</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_OK>LSM_OK</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_PROTOCOL>LSM_PROTOCOL</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SAFETY_FULL>LSM_SAFETY_FULL</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SAFETY_NORMAL>LSM_SAFETY_NORMAL</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SAFETY_OFF>LSM_SAFETY_OFF</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SEEK_EQ>LSM_SEEK_EQ</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SEEK_GE>LSM_SEEK_GE</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SEEK_LE>LSM_SEEK_LE</a></span>
<span style=display:block;float:left;width:35ex><a href=#LSM_SEEK_LEFAST>LSM_SEEK_LEFAST</a></span>
<br style=clear:both>
<h2 id=database>Database Runtime Environment<a id=lsm_env></a><a id=LSM_MUTEX_GLOBAL></a><a id=LSM_MUTEX_HEAP></a></h2>
<verbatim>struct lsm_env {
  int nByte;                 /* Size of this structure in bytes */
  int iVersion;              /* Version number of this structure (1) */
  /****** file i/o ***********************************************/
  void *pVfsCtx;
  int (*xFullpath)(lsm_env*, const char *, char *, int *);
  int (*xOpen)(lsm_env*, const char *, lsm_file **);
  int (*xRead)(lsm_file *, lsm_i64, void *, int);
  int (*xWrite)(lsm_file *, lsm_i64, void *, int);
  int (*xTruncate)(lsm_file *, lsm_i64);
  int (*xSync)(lsm_file *);
  int (*xSectorSize)(lsm_file *);
  int (*xRemap)(lsm_file *, lsm_i64, void **, lsm_i64*);
  int (*xFileid)(lsm_file *, void *pBuf, int *pnBuf);
  int (*xClose)(lsm_file *);
  int (*xUnlink)(lsm_env*, const char *);
  int (*xLock)(lsm_file*, int, int);
  int (*xShmMap)(lsm_file*, int, int, void **);
  void (*xShmBarrier)(void);
  int (*xShmUnmap)(lsm_file*, int);
  /****** memory allocation ****************************************/
  void *pMemCtx;
  void *(*xMalloc)(lsm_env*, int);            /* malloc(3) function */
  void *(*xRealloc)(lsm_env*, void *, int);   /* realloc(3) function */
  void (*xFree)(lsm_env*, void *);            /* free(3) function */
  sqlite4_size_t (*xSize)(lsm_env*, void *);  /* xSize function */
  /****** mutexes ****************************************************/
  void *pMutexCtx;
  int (*xMutexStatic)(lsm_env*,int,lsm_mutex**); /* Obtain a static mutex */
  int (*xMutexNew)(lsm_env*, lsm_mutex**);       /* Get a new dynamic mutex */
  void (*xMutexDel)(lsm_mutex *);           /* Delete an allocated mutex */
  void (*xMutexEnter)(lsm_mutex *);         /* Grab a mutex */
  int (*xMutexTry)(lsm_mutex *);            /* Attempt to obtain a mutex */
  void (*xMutexLeave)(lsm_mutex *);         /* Leave a mutex */
  int (*xMutexHeld)(lsm_mutex *);           /* Return true if mutex is held */
  int (*xMutexNotHeld)(lsm_mutex *);        /* Return true if mutex not held */
  /****** other ****************************************************/
  int (*xSleep)(lsm_env*, int microseconds);
  /* New fields may be added in future releases, in which case the
  ** iVersion value will increase. */
};
#define LSM_MUTEX_GLOBAL 1
#define LSM_MUTEX_HEAP   2
</verbatim>
<p>Run-time environment used by LSM
Values that may be passed as the second argument to xMutexStatic. 
<h2 id=lsm>LSM Error Codes<a id=LSM_OK></a><a id=LSM_ERROR></a><a id=LSM_BUSY></a><a id=LSM_NOMEM></a><a id=LSM_IOERR></a><a id=LSM_CORRUPT></a><a id=LSM_FULL></a><a id=LSM_CANTOPEN></a><a id=LSM_PROTOCOL></a><a id=LSM_MISUSE></a></h2>
<verbatim>#define LSM_OK         0
#define LSM_ERROR      1
#define LSM_BUSY       5
#define LSM_NOMEM      7
#define LSM_IOERR     10
#define LSM_CORRUPT   11
Changes to www/lsmusr.wiki.
1
2
3
4
5
6
7

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34


















35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57



58
59
60
61
62
63
64
65


66
67
68
69
70
71
72
73
74

75

76
77

78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102

103
104
105
106
107
108
109
110
111
112
113

114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145







146






























147
148
149
150

151
152
153

154
155
156
157






158
159
160
















161
162
163
164
165
166
167

<title>LSM Users Guide</title>
<nowiki>

<h2>Table of Contents</h2>




<div id=start_of_toc></div>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#introduction_to_lsm style=text-decoration:none>1. Introduction to LSM</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#using_lsm_in_applications style=text-decoration:none>2. Using LSM in Applications </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#basic_usage style=text-decoration:none>3. Basic Usage</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#opening_and_closing_database_connections style=text-decoration:none>3.1. Opening and Closing Database Connections </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#writing_to_a_database style=text-decoration:none>3.2. Writing to a Database </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#reading_from_a_database style=text-decoration:none>3.3. Reading from a Database </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#database_transactions_and_mvcc style=text-decoration:none>3.4. Database Transactions and MVCC </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#database_robustness style=text-decoration:none>4. Database Robustness </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#compressed_and_encrypted_databases style=text-decoration:none>5. Compressed and Encrypted Databases </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#performance_tuning style=text-decoration:none>6. Performance Tuning</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#architectural_overview style=text-decoration:none>6.1. Architectural Overview </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#work_and_checkpoint_scheduling style=text-decoration:none>6.2. Work and Checkpoint Scheduling </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#automatic_work_and_checkpoint_scheduling style=text-decoration:none>6.2.1. Automatic Work and Checkpoint Scheduling</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#explicit_work_and_checkpoint_scheduling style=text-decoration:none>6.2.2. Explicit Work and Checkpoint Scheduling</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#compulsary_work_and_checkpoint_scheduling style=text-decoration:none>6.2.3. Compulsary Work and Checkpoint Scheduling</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#database_optimization style=text-decoration:none>6.3. Database Optimization</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#other_parameters style=text-decoration:none>6.4. Other Parameters </a><br>

<div id=end_of_toc></div>

<h2>Overview</h2>

<p>This page describes the LSM embedded database library and use thereof. 
It is intended to be part user-manual and part tutorial. It is intended to
to complement the <a href=lsmapi.wiki>LSM API reference manual</a>.



















<h1 id=introduction_to_lsm>1. Introduction to LSM</h1>

<p>LSM is an embedded database library for key-value data, roughly similar
in scope to
<a href="http://www.oracle.com/technetwork/products/berkeleydb/overview/index.html">Berkeley DB</a>, 
<a href="http://code.google.com/p/leveldb/">LevelDB</a> or
<a href="http://fallabs.com/kyotocabinet/">KyotoCabinet</a>.
Both keys and
values are specified and stored as byte arrays. Duplicate keys are not 
supported. Keys are always sorted in memcmp() order. It is not possible to
configure LSM to use a custom sort order. LSM supports the following 
operations for the manipulation and query of database data:

<ul>
  <li> Writing a specified key and value into the database.
  <li> Deleting a specified key from the database.
  <li> Deleting a range of keys from the database.
  <li> Querying the database for a specific key.
  <li> Iterating through a range of database keys (either forwards or
       backwards).
</ul>




<p>LSM supports a single-writer/multiple-reader MVCC based transactional 
concurrency model. SQL style nested sub-transactions are supported. 
Clients may concurrently access a single LSM database from within a single or
multiple application processes. 

<p>Usually, an entire LSM database is stored in a single file on disk. 
However, when a database client writes to the database, a log file is
created in the same directory as the database file. Normally, the log file


is deleted when the last database client closes the database. However,
if a crash occurs or database clients exit unexpectedly for some reason,
the log file is used by subsequent clients to recover the database. So
it is perhaps more accurate to say that an LSM database is stored on disk
in a single database file and a single (optional) log file.

<p>If required, it is possible to configure LSM to use external data
compression and/or encryption functions to transform data before it is
stored in the database file.



<p>Many database systems that support range queries, including <a
href=http://www.sqlite.org>SQLite 3</a>, Berkeley DB and Kyoto Cabinet, are

based on a <a href="http://en.wikipedia.org/wiki/B-tree">b-tree data
structure</a> or variant thereof. A b-tree structure minimizes the number of
disk sectors that must be read from disk when searching the database for a
specific key. However, b-tree implementations usually suffer from poor write
localization - updating the contents of a b-tree often involves modifying the
contents of nodes scattered throughout the database file. If the database is
stored on a spinning disk (HDD), then the disk heads must be moved before
writing non-contiguous sector, which is extremely slow. If the database is
stored on solid state storage (SDD) a similar phenomena is encountered due 
to the large erase-block sizes. In general, writing to a series of contiguous
disk sectors is orders of magnitude faster than updating to the same number
of disk sectors scattered randomly throughout a large file. Additionally,
b-tree structures are prone to fragmentation, reducing the speed of range
queries.

<p><i>Todo: Should have references for the claims above.</i>

<p><i>Also, fix the link in the next paragraph to point to something more
specific.</i>

<p>LSM uses a <a href=lsm.wiki>different data structure</a> that makes the
following performance tradeoffs relative to a b-tree:

<ul>
  <li> A very large percentage of the disk sectors modified are contiguous.

       Additionally, in many cases the total number of sectors written
       to disk is reduced. This makes writing to an LSM database much
       faster than the equivalent b-tree.

  <li> LSM databases do not suffer from fragmentation to the same degree
       as b-trees. This means that the performance of large range queries 
       does not degrade as the database is updated as it may with a b-tree.

  <li> It is accepted that under some circumstances searching an LSM 
       database for a given key will involve examining more disk sectors
       than it would with a b-tree. In this sense both b-trees and LSM

       provide O(log(N)) efficiency, but the base of the logarithm is 
       generally larger for a b-tree than for LSM.
</ul>

<p>In other words, all things considered equal, writing to an LSM database
should be very fast and scanning through large ranges of keys should also
perform well, but searching the database for specific keys may be slightly
slower than when using a b-tree based system. Additionally, avoiding random
writes in favour of largely contiguous updates can significantly reduce the
wear on SSD or flash memory devices.

<p>Although it has quite different features to LSM in other respects, 
LevelDB makes similar performance tradeoffs.

<p>Benchmark test results for LSM are available here. <i>Todo: Link to a page
with performance graphs here</i>


<h1 id=using_lsm_in_applications>2. Using LSM in Applications </h1>

<p>LSM is not currently built or distributed independently. Instead, it
is part of the SQLite4 library. To use LSM in an application, the application
links against libsqlite4 and includes the header file "lsm.h" in any files
that access the LSM API.

<p><i>Pointer to build instructions for sqlite4</i>

<h1 id=basic_usage>3. Basic Usage</h1>

<h2 id=opening_and_closing_database_connections>3.1. Opening and Closing Database Connections </h2>

<p>Opening a connection to a database is a two-step process.






































<verbatim>
  int rc;
  lsm_db *db;


  rc = lsm_new(0, &db);
  if( rc!=LSM_OK ) exit(1);


  rc = lsm_open(db, "test.db");
  if( rc!=LSM_OK ) exit(1);
</verbatim>







<verbatim>
  rc = lsm_close(db);
</verbatim>

















<h2 id=writing_to_a_database>3.2. Writing to a Database </h2>

<p>Three API functions are used to write to the database:

<ul>
  <li> <b>lsm_insert()</b>: insert a new key/value pair into the database,







>









|














|


>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>










|
<



|
|






>
>
>
|
|
|
|

|
|
<
>
>
|
|
<
<
<

|
|
|
>
|
>


>
|
|
|
|




|
|
|
|
|
<



|
|





|
>










|
>
|
|


|
|
|
|
|
|




|
|















|
>
>
>
>
>
>
>

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>




>



>




>
>
>
>
>
>



>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85

86
87
88
89



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251

<title>LSM Users Guide</title>
<nowiki>

<h2>Table of Contents</h2>




<div id=start_of_toc></div>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#introduction_to_lsm style=text-decoration:none>1. Introduction to LSM</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#using_lsm_in_applications style=text-decoration:none>2. Using LSM in Applications </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#basic_usage style=text-decoration:none>3. Basic Usage</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#opening_and_closing_database_connections style=text-decoration:none>3.1. Opening and Closing Database Connections </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#writing_to_a_database style=text-decoration:none>3.2. Writing to a Database </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#reading_from_a_database style=text-decoration:none>3.3. Reading from a Database </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#database_transactions_and_mvcc style=text-decoration:none>3.4. Database Transactions and MVCC </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#data_durability style=text-decoration:none>4. Data Durability </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#compressed_and_encrypted_databases style=text-decoration:none>5. Compressed and Encrypted Databases </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#performance_tuning style=text-decoration:none>6. Performance Tuning</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#architectural_overview style=text-decoration:none>6.1. Architectural Overview </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#work_and_checkpoint_scheduling style=text-decoration:none>6.2. Work and Checkpoint Scheduling </a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#automatic_work_and_checkpoint_scheduling style=text-decoration:none>6.2.1. Automatic Work and Checkpoint Scheduling</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#explicit_work_and_checkpoint_scheduling style=text-decoration:none>6.2.2. Explicit Work and Checkpoint Scheduling</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#compulsary_work_and_checkpoint_scheduling style=text-decoration:none>6.2.3. Compulsary Work and Checkpoint Scheduling</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#database_optimization style=text-decoration:none>6.3. Database Optimization</a><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#other_parameters style=text-decoration:none>6.4. Other Parameters </a><br>

<div id=end_of_toc></div>

<h2>Overview</h2>

<p>This document describes the LSM embedded database library and use thereof. 
It is intended to be part user-manual and part tutorial. It is intended to
to complement the <a href=lsmapi.wiki>LSM API reference manual</a>.

<p>The <a href=#introduction_to_lsm>first section</a> of this document contains
a description of the LSM library and its features. 
<a href=#using_lsm_in_applications>Section 2</a> describes how to use LSM from
within a C or C++ application (how to compile and link LSM, what to #include
etc.). The <a href=#basic_usage>third section</a> describes the essential APIs
that applications use to open and close database connections, and to read from
and write to databases. 

<p>The three sections described above contain all the information required to
create applications that use LSM. The remaining sections discuss more
specialized topics. <a href=#data_durability>Section 4</a> discusses the
configuration parameter that influences transaction durability (the guarantees
offered with respect to recently committed transactions if a power failure 
occurs). <a href=#compressed_and_encrypted_databases>Section 5</a> explains
the interface provided by LSM that allow external data compression and/or
encryption functions to be used to create compressed and/or encrypted
databases. <i>Todo: Clarify exactly what section 6 is and link it to here</i>.

<h1 id=introduction_to_lsm>1. Introduction to LSM</h1>

<p>LSM is an embedded database library for key-value data, roughly similar
in scope to
<a href="http://www.oracle.com/technetwork/products/berkeleydb/overview/index.html">Berkeley DB</a>, 
<a href="http://code.google.com/p/leveldb/">LevelDB</a> or
<a href="http://fallabs.com/kyotocabinet/">KyotoCabinet</a>.
Both keys and
values are specified and stored as byte arrays. Duplicate keys are not 
supported. Keys are always sorted in memcmp() order. LSM supports the following

operations for the manipulation and query of database data:

<ul>
  <li> Writing a new key and value into the database.
  <li> Deleting an existing key from the database.
  <li> Deleting a range of keys from the database.
  <li> Querying the database for a specific key.
  <li> Iterating through a range of database keys (either forwards or
       backwards).
</ul>

<p>Other salient features are:

<ul>
  <li><p>LSM supports a <b>single-writer/multiple-reader MVCC</b> based
      transactional concurrency model. SQL style nested sub-transactions are
      supported. Clients may concurrently access a single LSM database from
      within a single or multiple application processes. 

  <li><p>An entire LSM database is stored in a <b>single file on disk</b>. 


  <li><p>Data <b>durability in the face of application or power failure</b>.
      LSM may optionally use a write-ahead log file when writing to the
      database to ensure committed transactions are not lost if an application
      or power failure occurs.




  <li>LSM <b>may be configured to use external data compression and/or
      encryption routines</b> to create and access compressed and/or encrypted
      databases.
</ul>
    

<p>Many database systems that support range queries, including <a
href=http://www.sqlite.org>SQLite 3</a>, Berkeley DB and Kyoto Cabinet, are
based on one of many variants of the 
<a href="http://en.wikipedia.org/wiki/B-tree">b-tree data structure</a>.
B-trees are attractive because a b-tree structure minimizes the number of disk
sectors that must be read from disk when searching the database for a specific
key. However, b-tree implementations usually suffer from poor write
localization - updating the contents of a b-tree often involves modifying the
contents of nodes scattered throughout the database file. If the database is
stored on a spinning disk (HDD), then the disk heads must be moved before
writing non-contiguous sector, which is extremely slow. If the database is
stored on solid state storage (SDD) a similar phenomena is encountered due to
the large erase-block sizes. In general, writing to a series of contiguous disk
sectors is orders of magnitude faster than updating to the same number of disk
sectors scattered randomly throughout a large file. Additionally, b-tree
structures are prone to fragmentation, reducing the speed of range queries.


<p><i>Todo: Should have references for the claims above.</i>

<p><i>Also, fix the link in the next paragraph to point to a description
of the log-structured-merge tree within lsm.wiki (or its successor).</i>

<p>LSM uses a <a href=lsm.wiki>different data structure</a> that makes the
following performance tradeoffs relative to a b-tree:

<ul>
  <li> A very large percentage of the disk sectors modified when writing to
       the database are contiguous.
       Additionally, in many cases the total number of sectors written
       to disk is reduced. This makes writing to an LSM database much
       faster than the equivalent b-tree.

  <li> LSM databases do not suffer from fragmentation to the same degree
       as b-trees. This means that the performance of large range queries 
       does not degrade as the database is updated as it may with a b-tree.

  <li> It is accepted that under some circumstances searching an LSM 
       database for a given key will involve examining more disk sectors
       than it would with a b-tree. In terms of disk sectors accessed when
       searching a database of size N, both b-trees and LSM provide O(log(N))
       efficiency, but the base of the logarithm is generally larger for a
       b-tree than for LSM.
</ul>

<p>In other words, writing to an LSM database should be very fast and scanning
through large ranges of keys should also perform well, but searching the
database for specific keys may be slightly slower than when using a b-tree
based system. Additionally, avoiding random writes in favour of largely
contiguous updates (as LSM does) can significantly reduce the wear on SSD or
flash memory devices.

<p>Although it has quite different features to LSM in other respects, 
LevelDB makes similar performance tradeoffs.

<p>Benchmark test results for LSM are <a href=#>available here</a>. <i>Todo:
Fix this link to point to a page with performance graphs.</i>


<h1 id=using_lsm_in_applications>2. Using LSM in Applications </h1>

<p>LSM is not currently built or distributed independently. Instead, it
is part of the SQLite4 library. To use LSM in an application, the application
links against libsqlite4 and includes the header file "lsm.h" in any files
that access the LSM API.

<p><i>Pointer to build instructions for sqlite4</i>

<h1 id=basic_usage>3. Basic Usage</h1>

<h2 id=opening_and_closing_database_connections>3.1. Opening and Closing Database Connections </h2>

<p>Opening a connection to a database is a two-step process. The 
<a href=lsmapi.wiki#lsm_new>lsm_new()</a> function is used to create a new
database handle, and the <a href=lsmapi.wiki#lsm_open>lsm_open()</a> function 
is used to connect an existing database handle to a database on disk. This
is because some database connection properties may only be configured
before the database is opened. In that case, one or more calls to the 
<a href=lsmapi.wiki#lsm_config>lsm_config()</a> method are made between the
calls to lsm_new() and lsm_open().

<p>The functions are defined as follows:

<verbatim>
  int lsm_new(lsm_env *env, lsm_db **pDb);
  int lsm_open(lsm_db *db, const char *zFile);
</verbatim>

<p>Like most lsm_xxx() functions that return type int (the exception is
<a href=lsmapi.wiki#lsm_csr_valid>lsm_csr_valid()</a>), both of the above 
return LSM_OK (0) if successful, or an <a href=lsmapi.wiki#LSM_ERROR>LSM 
error code</a> otherwise. The first argument to lsm_new() may be passed either
a pointer to a <a href=lsmapi.wiki#lsm_env>database environment object</a>
or NULL. Almost all applications should pass NULL. A database environment
object allows the application to supply custom implementations of the various
operating system calls that LSM uses to read and write files, allocate heap 
memory, and coordinate between multiple application threads and processes.
This is normally only required if LSM is being used on a platform that is not
supported by default. Passing NULL instructs the library to use the default
implementations of all these things, which is usually the right thing to do.
The second argument to lsm_new() is an output variable. Assuming the call
is successful, *pDb is set to point to the new database handle before 
returning.

<p>The first argument passed to lsm_open() must be an existing database 
handle. The second is the name of the database file to connect to. Once
lsm_open() has been successfully called on a database handle, it can not be
called again on the same handle. Attempting to do so is an LSM_MISUSE error.

<p>For example, to create a new handle and connect it to database "test.db"
on disk:
<verbatim>
  int rc;
  lsm_db *db;

  /* Allocate a new database handle */
  rc = lsm_new(0, &db);
  if( rc!=LSM_OK ) exit(1);

  /* Connect the database handle to database "test.db" */
  rc = lsm_open(db, "test.db");
  if( rc!=LSM_OK ) exit(1);
</verbatim>

<p>A database connection can be closed using the lsm_close() function. Calling
lsm_close() disconnects from the database (assuming lsm_open() has been
successfully called) and deletes the handle itself. Attempting to use a
database handle after it has been passed to lsm_close() results in undefined
behaviour (likely a segfault).

<verbatim>
  rc = lsm_close(db);
</verbatim>

<p>It is important that lsm_close() is called to close all database handles
created with lsm_new(), particularly if the connection has written to the
database. If an application that has written to a database exits without
closing its database connection, then subsequent clients may have to run
"database recovery" when they open the database, making the lsm_open() call
less responsive. Additionally, not matching each successful lsm_new() call 
with a call to lsm_close() is a resource leak.

<p>Counter-intuitively, an lsm_close() call may fail. In this case the database
handle is not closed, so if the application exits it invites the "database
recovery" performance problem mentioned above. The usual reason for an
lsm_close() call failing is that the database handle has been used to create
<a href=lsmapi.wiki#lsm_csr_open>database cursors</a> that have not been 
closed. Unless all database cursors are closed before lsm_close() is called,
it fails with an LSM_BUSY error and the database handle is not closed.

<h2 id=writing_to_a_database>3.2. Writing to a Database </h2>

<p>Three API functions are used to write to the database:

<ul>
  <li> <b>lsm_insert()</b>: insert a new key/value pair into the database,
207
208
209
210
211
212
213




214
215


216





217
218




































219
220
221
222
223
224
225
226
227
228
229
230
231
232
233










234
235
236
237
238
239
240
241
242
243
244
245
246
247




248

249
250



















251



























252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270















271
272
273










274
275
276
277
278
279
280
  lsm_delete(db, "c", 1);
  lsm_delete_range(db, "c", 1, "f", 1);
  lsm_delete(db, "f", 1);
</verbatim>

<h2 id=reading_from_a_database>3.3. Reading from a Database </h2>





<verbatim>
  lsm_csr *csr;








  rc = lsm_csr_open(db, &csr);
</verbatim>





































<verbatim>
  rc = lsm_csr_seek(csr, "b", 1, LSM_SEEK_EQ);
  if( lsm_csr_valid(csr) ){
    const void *pVal; int nVal;

    rc = lsm_csr_value(csr, &pVal, &nVal);
    if( rc==LSM_OK ){
      /* pVal now points to a buffer nVal bytes in size containing the
      ** value associated with database key "b".  */
    }
  }
</verbatim>

<p> Iterate forwards through all keys in the database:










<verbatim>
  for(rc = lsm_csr_first(csr); lsm_csr_valid(csr); rc = lsm_csr_next(csr)){
    const void *pKey; int nKey;
    const void *pVal; int nVal;

    rc = lsm_csr_key(csr, &pKey, &nKey);
    if( rc==LSM_OK ) rc = lsm_csr_value(csr, &pVal, &nVal);
    if( rc!=LSM_OK ) break;

    /* At this point pKey points to the current key (size nKey bytes) and
    ** pVal points to the corresponding value (size nVal bytes).  */
  }
</verbatim>





<p> Iterate backwards through all keys from "ggg" to "cc", inclusive:


<verbatim>



















  rc = lsm_csr_seek(csr, "ggg", 3, LSM_SEEK_LE); 



























  for( ; lsm_csr_valid(csr); rc = lsm_csr_prev(csr)){
    const void *pKey; int nKey;
    const void *pVal; int nVal;
    int res;

    /* Compare the key that the cursor currently points to with "cc". If
    ** the cursor key is less than "cc", break out of the loop. */
    rc = lsm_csr_cmp(csr, "cc", 2, &res);
    if( rc!=LSM_OK || res<0 ) break;

    rc = lsm_csr_key(csr, &pKey, &nKey);
    if( rc==LSM_OK ) rc = lsm_csr_value(csr, &pVal, &nVal);
    if( rc!=LSM_OK ) break;

    /* At this point pKey points to the current key (size nKey bytes) and
    ** pVal points to the corresponding value (size nVal bytes).  */
  }
</verbatim>
















<verbatim>
  lsm_csr_close(csr);
</verbatim>











<h2 id=database_transactions_and_mvcc>3.4. Database Transactions and MVCC </h2>

<p>LSM supports a single-writer/multiple-reader 
<a href=http://en.wikipedia.org/wiki/Multiversion_concurrency_control>MVCC</a>
based transactional concurrency model. This is the same model that SQLite
supports in <a href="http://www.sqlite.org/wal.html">WAL mode</a>.







>
>
>
>


>
>

>
>
>
>
>
|

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>














|
>
>
>
>
>
>
>
>
>
>














>
>
>
>
|
>


>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|


















>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

|

>
>
>
>
>
>
>
>
>
>







291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
  lsm_delete(db, "c", 1);
  lsm_delete_range(db, "c", 1, "f", 1);
  lsm_delete(db, "f", 1);
</verbatim>

<h2 id=reading_from_a_database>3.3. Reading from a Database </h2>

<p>All data read from an LSM database is read via a cursor handle. Cursor
handles are opened using the 
<a href=lsmapi.wiki#lsm_csr_open>lsm_csr_open()</a> API, as follows:

<verbatim>
  lsm_csr *csr;
  rc = lsm_csr_open(db, &csr);
</verbatim>

<p>Once an application has finished using a database cursor, it must be closed
using the <a href=lsmapi.wiki#lsm_csr_close>lsm_csr_close()</a> API. The
lsm_csr_close() function does not return any value. It cannot fail.

<verbatim>
  lsm_csr_close(csr);
</verbatim>

<p>Database cursors support the following functions for positioning the cursor:

<ul>
  <li> <b>lsm_csr_seek()</b> - move the cursor to point to a nominated 
       database key.
  <li> <b>lsm_csr_first()</b> - move the cursor to point to the first entry in
       the database (the one with the smallest key).
  <li> <b>lsm_csr_last()</b> - move the cursor to point to the last entry in
       the database (the one with the largest key).
  <li> <b>lsm_csr_next()</b> - move the cursor to point to the next entry in 
       the the database.
  <li> <b>lsm_csr_prev()</b> - move the cursor to point to the previous entry
       in the database.
</ul>

<p>Once a cursor has been positioned, it supports the following functions
for retrieving the details of the current entry:

<ul>
  <li> <b>lsm_csr_valid()</b> - determine whether or not the cursor currently
       points to a valid entry.
  <li> <b>lsm_csr_key()</b> - retrieve the key associated with the
       database entry the cursor points to.
  <li> <b>lsm_csr_value()</b> - retrieve the value associated with the
       database entry the cursor points to.
  <li> <b>lsm_csr_cmp()</b> - compare a key supplied by the application with
       the key associated with the entry the cursor points to. 
</ul>

<p>The following example demonstrates using the 
<a href=lsmapi.wiki#lsm_csr_seek>lsm_csr_seek()</a> function to search the
database for a specified key, 
<a href=lsmapi.wiki#lsm_csr_valid>lsm_csr_valid()</a> to check if the search
was successful, and <a href=lsmapi.wiki#lsm_csr_value>lsm_csr_value()</a> 
to retrieve the value associated with the key within the database.

<verbatim>
  rc = lsm_csr_seek(csr, "b", 1, LSM_SEEK_EQ);
  if( lsm_csr_valid(csr) ){
    const void *pVal; int nVal;

    rc = lsm_csr_value(csr, &pVal, &nVal);
    if( rc==LSM_OK ){
      /* pVal now points to a buffer nVal bytes in size containing the
      ** value associated with database key "b".  */
    }
  }
</verbatim>

<p> The example code below iterates forwards through all entries (in key 
order, from smallest to largest) in the database. Function 
<a href=lsmapi.wiki#lsm_csr_first>lsm_csr_first()</a> is used to position
the cursor to point to the first entry in the database, and 
<a href=lsmapi.wiki#lsm_csr_next>lsm_csr_next()</a> is used to advance to
the next entry. After lsm_csr_next() is called to advance past the final
entry in the database, the cursor is left pointing to no entry at all,
lsm_csr_valid() returns 0, and the loop is finished. API function 
<a href=lsmapi.wiki#lsm_csr_key>lsm_csr_key()</a> is used to retrieve the
key associated with each database entry visited.

<verbatim>
  for(rc = lsm_csr_first(csr); lsm_csr_valid(csr); rc = lsm_csr_next(csr)){
    const void *pKey; int nKey;
    const void *pVal; int nVal;

    rc = lsm_csr_key(csr, &pKey, &nKey);
    if( rc==LSM_OK ) rc = lsm_csr_value(csr, &pVal, &nVal);
    if( rc!=LSM_OK ) break;

    /* At this point pKey points to the current key (size nKey bytes) and
    ** pVal points to the corresponding value (size nVal bytes).  */
  }
</verbatim>

<p> The example code above could be modified to iterate backwards through
the entries in the database (again in key order, but this time from largest
to smallest) by replacing the call to lsm_csr_first() with 
<a href=lsmapi.wiki#lsm_csr_last>lsm_csr_last()</a> and the call to lsm_csr_next() with <a href=lsmapi.wiki#lsm_csr_prev>lsm_csr_prev()</a>.

<p>The signature of lsm_csr_seek() is:

<verbatim>
  int lsm_csr_seek(lsm_cursor *csr, const void *pKey, int nKey, int eSeek);
</verbatim>

<p>The second and third arguments passed to lsm_csr_seek() define the key
to search the database for (pKey must point to the buffer containing the
nKey byte key when this function is called). Assuming no error occurs, if 
there an entry with the requested key is present in the database, the cursor 
is left pointing to it. Otherwise, if no such entry is present, the final
position of the cursor depends on the value passed as the fourth parameter
to lsm_csr_seek(). Valid values for the fourth parameter to lsm_csr_seek()
are:

<dl>
  <dt> LSM_SEEK_EQ
  <dd> <p style=margin-top:0>
       In this case, if the specified key is not present in the database, the
       cursor is not left pointing to any database entry (i.e. calling
       lsm_csr_valid() returns 0).

  <dt> LSM_SEEK_LE
  <dd> <p style=margin-top:0>
       If the specified key is not present in the database and the fourth
       argument to lsm_csr_seek() is LSM_SEEK_LE (Less than or Equal), the
       cursor is left pointing to the database entry with the largest key
       that is less than the specified key. Or, if there are no entries in
       the database with keys smaller than the specified key, the cursor is
       left pointing to no entry at all.

  <dt> LSM_SEEK_GE
  <dd> <p style=margin-top:0>
       If the specified key is not present in the database and the fourth
       argument to lsm_csr_seek() is LSM_SEEK_GE (Greater than or Equal), the
       cursor is left pointing to the database entry with the smallest key
       that is greater than the specified key. Or, if there are no entries 
       in the database with keys larger than the specified key, the cursor is
       left pointing to no entry at all.
  <dd> <p style=margin-top:0>
</dl>

<p> Calls made to lsm_csr_seek() with LSM_SEEK_EQ as the final argument are
slightly more efficient than those made specifying LSM_SEEK_LE or LSM_SEEK_GE.
So to retrieve a specific entry from a database, LSM_SEEK_EQ should be
preferred. The other two values are primarily useful for implementing
range queries. For example, to iterate backwards through all keys from "ggg" 
to "cc", inclusive:

<verbatim>
  for(rc = lsm_csr_seek(csr, "ggg", 3, LSM_SEEK_LE); lsm_csr_valid(csr); rc = lsm_csr_prev(csr)){
    const void *pKey; int nKey;
    const void *pVal; int nVal;
    int res;

    /* Compare the key that the cursor currently points to with "cc". If
    ** the cursor key is less than "cc", break out of the loop. */
    rc = lsm_csr_cmp(csr, "cc", 2, &res);
    if( rc!=LSM_OK || res<0 ) break;

    rc = lsm_csr_key(csr, &pKey, &nKey);
    if( rc==LSM_OK ) rc = lsm_csr_value(csr, &pVal, &nVal);
    if( rc!=LSM_OK ) break;

    /* At this point pKey points to the current key (size nKey bytes) and
    ** pVal points to the corresponding value (size nVal bytes).  */
  }
</verbatim>

<p>In the example code above, the call to lsm_csr_seek() positions the cursor
to point to the entry with key "ggg", if it exists, or to the largest entry
in the database with a key smaller than "ggg" if such a key can be found,
or to EOF otherwise. The lsm_csr_prev() call advances the cursor to the 
next entry in the database file (in key order from largest to smallest), and
the lsm_csr_valid() call returns 0 to break out of the loop once the
cursor is advance past the entry with the smallest key in the database. So
on its own, the "for" statement serves to iterate the cursor in reverse order 
through all keys in the database less than or equal to "ggg".

<p>The call to <a href=lsmapi.wiki#lsm_csr_cmp>lsm_csr_cmp()</a> call in the
body of the loop is used to enforce the lower bound (keys >= "cc") on the 
range query by breaking out of the loop if an entry with a key smaller than 
"cc" is ever visited. lsm_csr_cmp() has the following signature:

<verbatim>
  int lsm_csr_cmp(lsm_cursor *csr, const void *pKey, int nKey, int *piRes);
</verbatim>

<p> When lsm_csr_cmp() is called, the key specified by the second and third
arguments (pKey and nKey) is compared to the database key that the cursor
currently points to. Assuming no error occurs, depending on whether or not the
cursors key is less than, equal to, or greater than the specified key, *piRes
is set to a value less than, equal to, or greater than zero before returning.
In other words:

<pre><i>  *piRes = (cursors key) - (specified key) </i></pre>


<h2 id=database_transactions_and_mvcc>3.4. Database Transactions and MVCC </h2>

<p>LSM supports a single-writer/multiple-reader 
<a href=http://en.wikipedia.org/wiki/Multiversion_concurrency_control>MVCC</a>
based transactional concurrency model. This is the same model that SQLite
supports in <a href="http://www.sqlite.org/wal.html">WAL mode</a>.
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
  lsm_delete(db, "k", 1);
  lsm_rollback(db, 2);
  lsm_delete(db, "m", 1);
  lsm_commit(db, 0);
  
</verbatim>

<h1 id=database_robustness>4. Database Robustness </h1>

<p>The value of the configuration parameter LSM_CONFIG_SAFETY determines
how often data is synced to disk by the LSM library. This is an important
tradeoff - syncing less often can lead to orders of magnitude better
performance, but also exposes the application to the risk of partial or total
data loss in the event of a power failure;








|







640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
  lsm_delete(db, "k", 1);
  lsm_rollback(db, 2);
  lsm_delete(db, "m", 1);
  lsm_commit(db, 0);
  
</verbatim>

<h1 id=data_durability>4. Data Durability </h1>

<p>The value of the configuration parameter LSM_CONFIG_SAFETY determines
how often data is synced to disk by the LSM library. This is an important
tradeoff - syncing less often can lead to orders of magnitude better
performance, but also exposes the application to the risk of partial or total
data loss in the event of a power failure;

616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
two or more existing segments, the in-memory snapshot is updated immediately. 
This is the snapshot that database clients use when querying or otherwise
operating on the database.

<p> At any point after the in-memory snapshot has been updated, the in-memory
snapshot may be written into the database file header. This is known as
"checkpointing" the database. Depending on the value of the 
<a href=#robustness>LSM_CONFIG_SAFETY</a> parameter, it may be necessary to
ensure that all segments referenced by the snapshot have been synced to disk
(safely stored on the persistent media such that they will not be lost if a
power failure occurs) before doing so. It is not necessary for every version
of the in-memory snapshot to be checkpointed. The in-memory snapshot may be
modified multiple times between checkpoints.

<p>
Because a checkpointer process is often required to sync the database file
before updating the database header, "checkpointing" often appears to be the
costliest part of transfering data to the database file, at least in terms of
wall-clock time.








|
|
|
|
|
|







833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
two or more existing segments, the in-memory snapshot is updated immediately. 
This is the snapshot that database clients use when querying or otherwise
operating on the database.

<p> At any point after the in-memory snapshot has been updated, the in-memory
snapshot may be written into the database file header. This is known as
"checkpointing" the database. Depending on the value of the 
<a href=lsmapi.wiki#LSM_CONFIG_SAFETY>LSM_CONFIG_SAFETY</a> parameter, it may
be necessary to ensure that all segments referenced by the snapshot have been
synced to disk (safely stored on the persistent media such that they will not
be lost if a power failure occurs) before doing so. It is not necessary for
every version of the in-memory snapshot to be checkpointed. The in-memory
snapshot may be modified multiple times between checkpoints.

<p>
Because a checkpointer process is often required to sync the database file
before updating the database header, "checkpointing" often appears to be the
costliest part of transfering data to the database file, at least in terms of
wall-clock time.

1008
1009
1010
1011
1012
1013
1014


1015
1016
1017
<ul>
  <li> LSM_CONFIG_MMAP
  <li> LSM_CONFIG_MULTIPLE_PROCESSES
  <li> LSM_CONFIG_USE_LOG
</ul>

</i>












>
>



1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
<ul>
  <li> LSM_CONFIG_MMAP
  <li> LSM_CONFIG_MULTIPLE_PROCESSES
  <li> LSM_CONFIG_USE_LOG
</ul>

</i>