Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix small issues in lsmusr.wiki.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 390479743520405b4a7d8dc81170d9886bc170e4
User & Date: dan 2012-11-12 20:19:35.166
Context
2012-11-13
14:03
Add table of contents to lsmusr.wiki. check-in: 71b26d318d user: dan tags: trunk
2012-11-12
20:19
Fix small issues in lsmusr.wiki. check-in: 3904797435 user: dan tags: trunk
19:41
Updates to lsmusr.wiki. check-in: f7ef6cec1f user: dan tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to www/lsmusr.wiki.
45
46
47
48
49
50
51









52
53
54


55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72


















73
74
75


76
77
78
79
80




81
82
83





84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
compression and/or encryption functions to transform data before it is
stored in the database file.

<p><i>Say something about the difference in performance characteristics 
between a b-tree and whatever it is LSM is. Link the performance graphs page.
</i>










<h1>2. Basic Usage</h1>

<h2>2.1 Opening and Closing Database Connections </h2>



<verbatim>
  int rc;
  lsm_db *db;

  rc = lsm_new(0, &db);
  if( rc!=LSM_OK ) exit(1);

  rc = lsm_open(db, "test.db");
  if( rc!=LSM_OK ) exit(1);
</verbatim>

<verbatim>
  rc = lsm_close(db);
</verbatim>

<h2>2.2 Writing to a Database </h2>



















<verbatim>
  rc = lsm_insert(db, "a", 1, "one", 3);
</verbatim>



<verbatim>
  rc = lsm_delete(db, "a", 1);
</verbatim>





<verbatim>
  rc = lsm_delete_range(db, "c", 1, "f", 1);
</verbatim>






<verbatim>
  /* Should be checking return codes! */
  lsm_delete(db, "c", 1);
  lsm_delete_range(db, "c", 1, "f", 1);
  lsm_delete(db, "f", 1);
</verbatim>

<h2>2.3 Reading from a Database </h2>

<verbatim>
  lsm_csr *csr;

  rc = lsm_csr_open(db, &csr);
</verbatim>








>
>
>
>
>
>
>
>
>
|

|
>
>
















|

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



>
>





>
>
>
>



>
>
>
>
>








|







45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
compression and/or encryption functions to transform data before it is
stored in the database file.

<p><i>Say something about the difference in performance characteristics 
between a b-tree and whatever it is LSM is. Link the performance graphs page.
</i>

<h1>2. Using the LSM Library </h1>

<p>LSM is not currently built or distributed independently. Instead, it
is part of the SQLite4 library. To use LSM in an application, the application
links against libsqlite4 and includes the header file "lsm.h" in any files
that access the LSM API.

<p><i>Pointer to build instructions for sqlite4</i>

<h1>3. Basic Usage</h1>

<h2>3.1 Opening and Closing Database Connections </h2>

<p>Opening a connection to a database is a two-step process.

<verbatim>
  int rc;
  lsm_db *db;

  rc = lsm_new(0, &db);
  if( rc!=LSM_OK ) exit(1);

  rc = lsm_open(db, "test.db");
  if( rc!=LSM_OK ) exit(1);
</verbatim>

<verbatim>
  rc = lsm_close(db);
</verbatim>

<h2>3.2 Writing to a Database </h2>

<p>Three API functions are used to write to the database:

<ul>
  <li> <b>lsm_insert()</b>: insert a new key/value pair into the database,
       overwriting any existing entry with the same key.
  <li> <b>lsm_delete()</b>: remove a specific key from the database.
  <li> <b>lsm_delete_range()</b>: remove an open-ended range (one that does not
       include its endpoints) of keys from the database. 
</ul>

<p>Each of these functions returns LSM_OK (0) if successful, or an LSM error
code otherwise (some non-zero value).

<p>The following example code inserts a key/value pair into the database. The
key is a 1-byte blob consisting of the value 0x61, and the value is a 3 byte
blob consisting of 0x6F, 0x6E and 0x65, in that order. An application might
interpret these as utf-8 or ASCII codepoints, but LSM treats them as opaque
blobs of data.
<verbatim>
  rc = lsm_insert(db, "a", 1, "one", 3);
</verbatim>

<p>Remove the entry with the key "a" (single 0x61 octet) from the database:

<verbatim>
  rc = lsm_delete(db, "a", 1);
</verbatim>

<p>Remove all entries with keys that are greater than "c" but less than "f".
In this context key blobs are compared in the normal way - using memcmp(), 
with longer keys being considered larger than their prefixes.

<verbatim>
  rc = lsm_delete_range(db, "c", 1, "f", 1);
</verbatim>

<p>The example above removes all keys between "c" and "f", but does not
remove the endpoints "c" and "f" themselves. To do this, requires three
separate calls - one to remove the open-ended range of keys and two to 
remove the two endpoints. As follows:

<verbatim>
  /* Should be checking return codes! */
  lsm_delete(db, "c", 1);
  lsm_delete_range(db, "c", 1, "f", 1);
  lsm_delete(db, "f", 1);
</verbatim>

<h2>3.3 Reading from a Database </h2>

<verbatim>
  lsm_csr *csr;

  rc = lsm_csr_open(db, &csr);
</verbatim>

148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
  }
</verbatim>

<verbatim>
  lsm_csr_close(csr);
</verbatim>

<h2 id=robustness>2.4 Database Robustness </h2>

<p>The value of the configuration parameter LSM_CONFIG_SAFETY determines
how often data is synced to disk by the LSM library. This is an important
tradeoff - syncing less often can lead to orders of magnitude better
performance, but also exposes the application to the risk of partial or total
data loss in the event of a power failure;








|







188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
  }
</verbatim>

<verbatim>
  lsm_csr_close(csr);
</verbatim>

<h2 id=robustness>3.4 Database Robustness </h2>

<p>The value of the configuration parameter LSM_CONFIG_SAFETY determines
how often data is synced to disk by the LSM library. This is an important
tradeoff - syncing less often can lead to orders of magnitude better
performance, but also exposes the application to the risk of partial or total
data loss in the event of a power failure;

204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
  /* At this point, variable iSafety is set to the currently configured value
  ** of the LSM_CONFIG_SAFETY parameter (either 0, 1 or 2).  */
</verbatim>

<p>The lsm_config() function may also be used to configure other database
connection parameters.  

<h2>2.5 Database Transactions and MVCC </h2>

<p>LSM supports a single-writer/multiple-reader 
<a href=http://en.wikipedia.org/wiki/Multiversion_concurrency_control>MVCC</a>
based transactional concurrency model. This is the same model that SQLite
supports in <a href="http://www.sqlite.org/wal.html">WAL mode</a>.

<p>A read-transaction must be opened in order to read from the database. 







|







244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
  /* At this point, variable iSafety is set to the currently configured value
  ** of the LSM_CONFIG_SAFETY parameter (either 0, 1 or 2).  */
</verbatim>

<p>The lsm_config() function may also be used to configure other database
connection parameters.  

<h2>3.5 Database Transactions and MVCC </h2>

<p>LSM supports a single-writer/multiple-reader 
<a href=http://en.wikipedia.org/wiki/Multiversion_concurrency_control>MVCC</a>
based transactional concurrency model. This is the same model that SQLite
supports in <a href="http://www.sqlite.org/wal.html">WAL mode</a>.

<p>A read-transaction must be opened in order to read from the database. 
359
360
361
362
363
364
365

366

































































367
368
369
370
371
372
373
374
375
376
  lsm_delete(db, "k", 1);
  lsm_rollback(db, 2);
  lsm_delete(db, "m", 1);
  lsm_commit(db, 0);
  
</verbatim>




































































<h1>3. Performance Tuning</h1>

<h2>3.1 Architectural Overview </h2>

<p> The LSM library implements two separate data structures that are used 
together to store user data. When the database is queried, the library 
actually runs parallel queries on both of these data stores and merges the
results together to return to the user. The data structures are:

<ul>







>

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|

|







399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
  lsm_delete(db, "k", 1);
  lsm_rollback(db, 2);
  lsm_delete(db, "m", 1);
  lsm_commit(db, 0);
  
</verbatim>

<h1>4. Compressed and Encrypted Databases </h1>

<p>LSM does not provide built-in methods for creating encrypted or compressed
databases. Instead, it allows the user to provide hooks to call external
functions to compress and/or encrypt data before it is written to the database
file, and to decrypt and/or uncompress data as it is read from the database
file.

<p>A database connection is configured to call compression functions using a
call to lsm_config() with the second argument set to
LSM_CONFIG_SET_COMPRESSION. The third argument should point to an instance
of type lsm_compress, which is defined as follows:

<verbatim>
  typedef struct lsm_compress lsm_compress;
  struct lsm_compress {
    u32 iId;
    void *pCtx;
    int (*xBound)(void *pCtx, int nIn);
    int (*xCompress)(void *pCtx, void *pOut, int *pnOut, const void *pIn, int nIn);
    int (*xUncompress)(void *pCtx, void *pOut, int *pnOut, const void *pIn, int nIn);
    void (*xFree)(void *pCtx);
  };
</verbatim>

<p><i> Explain how the hooks work here (same as zipvfs) </i>

<p><i> Example code? Using zlib? Or something simple like an RLE
implementation?</i>

<p>The database file header of any LSM database contains a 32-bit unsigned
"compression id" field. If the database is not a compressed database, this
field is set to 1. Otherwise, it is set to an application supplied value
identifying the compression and/or encryption scheme in use. Application
compression scheme ids must be greater than or equal to 10000. Values smaller
than 10000 are reserved for internal use.

<p>The lsm_compression_id() API may be used to read the compression id from
a database connection. Because the compression id is stored in the database
header, it may be read before any required compression or encryption hooks
are configured.

<verbatim>
  #define LSM_COMPRESSION_EMPTY    0
  #define LSM_COMPRESSION_NONE     1
  int lsm_compression_id(lsm_db *db, u32 *piId);
</verbatim>

<p>When a database is opened for the first time, before it is first written,
the compression id field is set to LSM_COMPRESSION_EMPTY (0). The first time
a transaction is committed, the database compression id is set to a copy of 
the lsm_compress.iId field of the compression hooks for the database handle
committing the transaction, or to LSM_COMPRESSION_NONE (1) if no compression
hooks are configured.

<p>Once the compression id is set to something other than 
LSM_COMPRESSION_EMPTY, when a database handle opens a read or write 
transaction on the database, the compression id is compared against the 
lsm_compress.iId field of the configured compression hooks, or against LSM_COMPRESSION_NONE if no compression hooks are configured. If the compression id
does not match, then an LSM_MISMATCH error is returned and the operation 
fails (no transaction or database cursor is opened).

<p><i>Maybe there should be a way to register a mismatch-handler callback.
Otherwise, applications have to handle LSM_MISMATCH everywhere...
</i>


<h1>5. Performance Tuning</h1>

<h2>5.1 Architectural Overview </h2>

<p> The LSM library implements two separate data structures that are used 
together to store user data. When the database is queried, the library 
actually runs parallel queries on both of these data stores and merges the
results together to return to the user. The data structures are:

<ul>
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
database file header (to checkpoint the database). 
</table>

<p>The tasks associated with each of the locks above may be performed
concurrently by multiple database connections, located either in the same
application process or different processes.

<h2>3.2 Work and Checkpoint Scheduling </h2>

<p>The section above describes the three stages of transfering data written
to the database from the application to persistent storage. A "writer" 
client writes the data into the in-memory tree and log file. Later on a 
"worker" client flushes the data from the in-memory tree to a new segment
in the the database file. Additionally, a worker client must periodically
merge existing database segments together to prevent them from growing too
numerous.

<h3>3.2.1 Automatic Work and Checkpoint Scheduling</h3>

<p>By default, database "work" (the flushing and merging of segments, performed
by clients holding the WORKER lock) and checkpointing are scheduled and
performed automatically from within calls to "write" API functions. The 
"write" functions are:

<ul>







|









|







605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
database file header (to checkpoint the database). 
</table>

<p>The tasks associated with each of the locks above may be performed
concurrently by multiple database connections, located either in the same
application process or different processes.

<h2>5.2 Work and Checkpoint Scheduling </h2>

<p>The section above describes the three stages of transfering data written
to the database from the application to persistent storage. A "writer" 
client writes the data into the in-memory tree and log file. Later on a 
"worker" client flushes the data from the in-memory tree to a new segment
in the the database file. Additionally, a worker client must periodically
merge existing database segments together to prevent them from growing too
numerous.

<h3>5.2.1 Automatic Work and Checkpoint Scheduling</h3>

<p>By default, database "work" (the flushing and merging of segments, performed
by clients holding the WORKER lock) and checkpointing are scheduled and
performed automatically from within calls to "write" API functions. The 
"write" functions are:

<ul>
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
than zero, after performing database work, the library automatically checks
how many bytes of raw data have been written to the database file since the
last checkpoint (by any client, not just by the current client). If this
value is greater than the value of the LSM_CONFIG_AUTOCHECKPOINT parameter,
a checkpoint is attempted. It is not an error if the attempt fails because the
CHECKPOINTER lock cannot be obtained.

<h3>3.2.2 Explicit Work and Checkpoint Scheduling</h3>

<p>The alternative to automatic scheduling of work and checkpoint operations
is to explicitly schedule them. Possibly in a background thread or dedicated
application process. In order to disable automatic work, a client must set
the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of
a database connection, not of a database itself, so it must be cleared
separately by all processes that may write to the database. Otherwise, they







|







716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
than zero, after performing database work, the library automatically checks
how many bytes of raw data have been written to the database file since the
last checkpoint (by any client, not just by the current client). If this
value is greater than the value of the LSM_CONFIG_AUTOCHECKPOINT parameter,
a checkpoint is attempted. It is not an error if the attempt fails because the
CHECKPOINTER lock cannot be obtained.

<h3>5.2.2 Explicit Work and Checkpoint Scheduling</h3>

<p>The alternative to automatic scheduling of work and checkpoint operations
is to explicitly schedule them. Possibly in a background thread or dedicated
application process. In order to disable automatic work, a client must set
the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of
a database connection, not of a database itself, so it must be cleared
separately by all processes that may write to the database. Otherwise, they
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
  rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive);
</verbatim>

<verbatim>
  int lsm_flush(lsm_db *db);
</verbatim>

<h3>3.2.3 Compulsary Work and Checkpoint Scheduling</h3>

<p>Apart from the scenarios described above, there are two there are two 
scenarios where database work or checkpointing may be performed automatically,
regardless of the value of the LSM_CONFIG_AUTOWORK parameter.

<ul>
  <li> When closing a database connection, and 







|







837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
  rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive);
</verbatim>

<verbatim>
  int lsm_flush(lsm_db *db);
</verbatim>

<h3>5.2.3 Compulsary Work and Checkpoint Scheduling</h3>

<p>Apart from the scenarios described above, there are two there are two 
scenarios where database work or checkpointing may be performed automatically,
regardless of the value of the LSM_CONFIG_AUTOWORK parameter.

<ul>
  <li> When closing a database connection, and 
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
</ul>

<p>Finally, regardless of age, a database is limited to a maximum of 64
segments in total. If an attempt is made to flush an in-memory tree to disk
when the database already contains 64 segments, two or more existing segments
must be merged together before the new segment can be created.

<h2>3.3 Database Optimization</h2>

<p>Database optimization transforms the contents of database file so that
the following are true:

<ul>
  <li> All database content is stored in a single segment.
  <li> The database file contains no (or as little as possible) free space.







|







882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
</ul>

<p>Finally, regardless of age, a database is limited to a maximum of 64
segments in total. If an attempt is made to flush an in-memory tree to disk
when the database already contains 64 segments, two or more existing segments
must be merged together before the new segment can be created.

<h2>5.3 Database Optimization</h2>

<p>Database optimization transforms the contents of database file so that
the following are true:

<ul>
  <li> All database content is stored in a single segment.
  <li> The database file contains no (or as little as possible) free space.
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823

824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
parameter should be set to a non-zero value, or otherwise lsm_checkpoint()
should be called periodically. Otherwise, no checkpoints will be performed,
preventing the library from reusing any space occupied by old segments even
after their content has been merged into the new segment. The result - a
database file that is optimized, except that it is up to twice as large as
it otherwise would be.

<h2>3.3 Other Parameters </h2>

<i>
<p>Mention other configuration options that can be used to tune performance
here.

<ul>
  <li> LSM_CONFIG_MMAP
  <li> LSM_CONFIG_MULTIPLE_PROCESSES

</ul>

</i>

<h1>4. Compressed and Encrypted Databases </h1>

<p>LSM does not provide built-in methods for creating encrypted or compressed
databases. Instead, it allows the user to provide hooks to call external
functions to compress and/or encrypt data before it is written to the database
file, and to decrypt and/or uncompress data as it is read from the database
file.

<p>A database connection is configured to call compression functions using a
call to lsm_config() with the second argument set to
LSM_CONFIG_SET_COMPRESSION. The third argument should point to an instance
of type lsm_compress, which is defined as follows:

<verbatim>
  typedef struct lsm_compress lsm_compress;
  struct lsm_compress {
    u32 iId;
    void *pCtx;
    int (*xBound)(void *pCtx, int nIn);
    int (*xCompress)(void *pCtx, void *pOut, int *pnOut, const void *pIn, int nIn);
    int (*xUncompress)(void *pCtx, void *pOut, int *pnOut, const void *pIn, int nIn);
    void (*xFree)(void *pCtx);
  };
</verbatim>

<p><i> Explain how the hooks work here (same as zipvfs) </i>

<p><i> Example code? Using zlib? Or something simple like an RLE
implementation?</i>

<p>The database file header of any LSM database contains a 32-bit unsigned
"compression id" field. If the database is not a compressed database, this
field is set to 1. Otherwise, it is set to an application supplied value
identifying the compression and/or encryption scheme in use. Application
compression scheme ids must be greater than or equal to 10000. Values smaller
than 10000 are reserved for internal use.

<p>The lsm_compression_id() API may be used to read the compression id from
a database connection. Because the compression id is stored in the database
header, it may be read before any required compression or encryption hooks
are configured.

<verbatim>
  #define LSM_COMPRESSION_EMPTY    0
  #define LSM_COMPRESSION_NONE     1
  int lsm_compression_id(lsm_db *db, u32 *piId);
</verbatim>

<p>When a database is opened for the first time, before it is first written,
the compression id field is set to LSM_COMPRESSION_EMPTY (0). The first time
a transaction is committed, the database compression id is set to a copy of 
the lsm_compress.iId field of the compression hooks for the database handle
committing the transaction, or to LSM_COMPRESSION_NONE (1) if no compression
hooks are configured.

<p>Once the compression id is set to something other than 
LSM_COMPRESSION_EMPTY, when a database handle opens a read or write 
transaction on the database, the compression id is compared against the 
lsm_compress.iId field of the configured compression hooks, or against LSM_COMPRESSION_NONE if no compression hooks are configured. If the compression id
does not match, then an LSM_MISMATCH error is returned and the operation 
fails (no transaction or database cursor is opened).

<p><i>Maybe there should be a way to register a mismatch-handler callback.
Otherwise, applications have to handle LSM_MISMATCH everywhere...
</i>











|








>




<

<
<
<
<
<

<
<
<
<

<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934

935





936




937
























































parameter should be set to a non-zero value, or otherwise lsm_checkpoint()
should be called periodically. Otherwise, no checkpoints will be performed,
preventing the library from reusing any space occupied by old segments even
after their content has been merged into the new segment. The result - a
database file that is optimized, except that it is up to twice as large as
it otherwise would be.

<h2>5.4 Other Parameters </h2>

<i>
<p>Mention other configuration options that can be used to tune performance
here.

<ul>
  <li> LSM_CONFIG_MMAP
  <li> LSM_CONFIG_MULTIPLE_PROCESSES
  <li> LSM_CONFIG_USE_LOG
</ul>

</i>