SQLite4
Check-in [1ea9187820]
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updates to lsmusr.wiki.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 1ea91878201f0998b35a6843da7c9e4c51bceb3d
User & Date: dan 2012-11-14 20:09:24
Context
2012-11-15
14:19
Add words to lsmusr.wiki. check-in: 2077c9d152 user: dan tags: trunk
2012-11-14
20:09
Updates to lsmusr.wiki. check-in: 1ea9187820 user: dan tags: trunk
18:23
Improvements to lsmusr.wiki. check-in: e47b5e3ae6 user: dan tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to www/lsmusr.wiki.

     2      2   <title>LSM Users Guide</title>
     3      3   <nowiki>
     4      4   
     5      5   <h2>Table of Contents</h2>
     6      6   
     7      7   
     8      8   
            9  +
     9     10   
    10     11   <div id=start_of_toc></div>
    11     12   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#introduction_to_lsm style=text-decoration:none>1. Introduction to LSM</a><br>
    12     13   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#using_lsm_in_applications style=text-decoration:none>2. Using LSM in Applications </a><br>
    13     14   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#basic_usage style=text-decoration:none>3. Basic Usage</a><br>
    14     15   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#opening_and_closing_database_connections style=text-decoration:none>3.1. Opening and Closing Database Connections </a><br>
    15     16   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#writing_to_a_database style=text-decoration:none>3.2. Writing to a Database </a><br>
    16     17   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#reading_from_a_database style=text-decoration:none>3.3. Reading from a Database </a><br>
    17     18   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#database_transactions_and_mvcc style=text-decoration:none>3.4. Database Transactions and MVCC </a><br>
    18     19   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#data_durability style=text-decoration:none>4. Data Durability </a><br>
    19     20   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#compressed_and_encrypted_databases style=text-decoration:none>5. Compressed and Encrypted Databases </a><br>
    20     21   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#performance_tuning style=text-decoration:none>6. Performance Tuning</a><br>
    21         -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#architectural_overview style=text-decoration:none>6.1. Architectural Overview </a><br>
    22         -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#work_and_checkpoint_scheduling style=text-decoration:none>6.2. Work and Checkpoint Scheduling </a><br>
    23         -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#automatic_work_and_checkpoint_scheduling style=text-decoration:none>6.2.1. Automatic Work and Checkpoint Scheduling</a><br>
    24         -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#explicit_work_and_checkpoint_scheduling style=text-decoration:none>6.2.2. Explicit Work and Checkpoint Scheduling</a><br>
    25         -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#compulsary_work_and_checkpoint_scheduling style=text-decoration:none>6.2.3. Compulsary Work and Checkpoint Scheduling</a><br>
    26         -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#database_optimization style=text-decoration:none>6.3. Database Optimization</a><br>
    27         -&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#other_parameters style=text-decoration:none>6.4. Other Parameters </a><br>
           22  +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#performance_related_configuration_options style=text-decoration:none>6.1. Performance Related Configuration Options </a><br>
           23  +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#using_worker_threads_or_processes style=text-decoration:none>6.2. Using Worker Threads or Processes </a><br>
           24  +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#architectural_overview style=text-decoration:none>6.2.1. Architectural Overview </a><br>
           25  +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#automatic_work_and_checkpoint_scheduling style=text-decoration:none>6.2.2. Automatic Work and Checkpoint Scheduling</a><br>
           26  +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#explicit_work_and_checkpoint_scheduling style=text-decoration:none>6.2.3. Explicit Work and Checkpoint Scheduling</a><br>
           27  +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#compulsary_work_and_checkpoint_scheduling style=text-decoration:none>6.2.4. Compulsary Work and Checkpoint Scheduling</a><br>
           28  +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href=#database_file_optimization style=text-decoration:none>6.3. Database File Optimization</a><br>
    28     29   
    29     30   <div id=end_of_toc></div>
    30     31   
    31     32   <h2>Overview</h2>
    32     33   
    33     34   <p>This document describes the LSM embedded database library and use thereof. 
    34     35   It is intended to be part user-manual and part tutorial. It is intended to
    35         -to complement the <a href=lsmapi.wiki>LSM API reference manual</a>.
           36  +complement the <a href=lsmapi.wiki>LSM API reference manual</a>.
    36     37   
    37     38   <p>The <a href=#introduction_to_lsm>first section</a> of this document contains
    38     39   a description of the LSM library and its features. 
    39     40   <a href=#using_lsm_in_applications>Section 2</a> describes how to use LSM from
    40     41   within a C or C++ application (how to compile and link LSM, what to #include
    41     42   etc.). The <a href=#basic_usage>third section</a> describes the essential APIs
    42     43   that applications use to open and close database connections, and to read from
................................................................................
   766    767   <p><i>Maybe there should be a way to register a mismatch-handler callback.
   767    768   Otherwise, applications have to handle LSM_MISMATCH everywhere...
   768    769   </i>
   769    770   
   770    771   
   771    772   <h1 id=performance_tuning>6. Performance Tuning</h1>
   772    773   
   773         -<h2 id=architectural_overview>6.1. Architectural Overview </h2>
          774  +<p> This section describes the various measures that can be taken in order to
          775  +fine-tune LSM in order to improve performance in specific circumstances.
          776  +Sub-section 6.1 identifies the 
          777  +<a href=#performance_related_configuration_options> configuration
          778  +parameters</a> that can be used to influence database performance. 
          779  +Sub-section 6.2 discusses methods for shifting the time-consuming processes of
          780  +actually writing and syncing the database file to 
          781  +<a href=#using_worker_threads_or_processes>background threads or processes</a> 
          782  +in order to make writing to the database more responsive. Finally, 6.
          783  +3 introduces "<a href=#database_file_optimization>database optimization</a>"
          784  +- the process of reorganizing a database file internally so that it is as small
          785  +as possible and optimized for search queries.
          786  +
          787  +<h2 id=performance_related_configuration_options>6.1. Performance Related Configuration Options </h2>
          788  +
          789  +<p>The options in this section all take integer values. They may be both
          790  +set and queried using the <a href=lsmapi.wiki#lsm_config>lsm_config()</a>
          791  +function. To set an option to a value, lsm_config() is used as follows:
          792  +
          793  +<verbatim>
          794  +  /* Set the LSM_CONFIG_AUTOFLUSH option to 1MB */
          795  +  int iVal = 1 * 1024 * 1024;
          796  +  rc = lsm_config(db, LSM_CONFIG_AUTOFLUSH, &iVal);
          797  +</verbatim>
          798  +
          799  +<p>In order to query the current value of an option, the initial value of
          800  +the parameter (iVal in the example code above) should be set to a negative
          801  +value. Or any other value that happens to be out of range for the parameter -
          802  +negative values just happen to be out of range for all integer lsm_config()
          803  +parameters.
          804  +
          805  +<verbatim>
          806  +  /* Set iVal to the current value of LSM_CONFIG_AUTOFLUSH */
          807  +  int iVal = -1;
          808  +  rc = lsm_config(db, LSM_CONFIG_AUTOFLUSH, &iVal);
          809  +</verbatim>
          810  +
          811  +<dl>
          812  +  <dt> <a href=lsmapi.wiki#LSM_CONFIG_MMAP>LSM_CONFIG_MMAP</a>
          813  +  <dd> <p style=margin-top:0>
          814  +    This option may be set to either 1 (true) or 0 (false). If it is set to
          815  +    true and LSM is running on a system with a 64-bit address space, the
          816  +    entire database file is memory mapped. Or, if it is false or LSM is 
          817  +    running in a 32-bit address space, data is accessed using ordinary
          818  +    OS file read and write primitives. Memory mapping the database file
          819  +    can significantly improve the performance of read operations, as database 
          820  +    pages do not have to be copied from operating system buffers into user 
          821  +    space buffers before they can be examined. 
          822  +
          823  +    <p>This option can only be set before lsm_open() is called on the database
          824  +    connection.
          825  +
          826  +    <p>The default value is 1 (true).
          827  +
          828  +  <dt> <a href=lsmapi.wiki#LSM_CONFIG_MULTIPLE_PROCESSES>LSM_CONFIG_MULTIPLE_PROCESSES</a>
          829  +  <dd> <p style=margin-top:0>
          830  +    This option may also be set to either 1 (true) or 0 (false). If it is
          831  +    set to 0, then the library assumes that all database clients are located 
          832  +    within the same process (have access to the same memory space). Assuming
          833  +    this means the library can avoid using OS file locking primitives to lock
          834  +    the database file, which speeds up opening and closing read and write
          835  +    transactions. 
          836  +
          837  +    <p>This option can only be set before lsm_open() is called on the database
          838  +    connection.
          839  +
          840  +    <p>The default value is 1 (true).
          841  +
          842  +  <dt> <a href=lsmapi.wiki#LSM_CONFIG_USE_LOG>LSM_CONFIG_USE_LOG</a>
          843  +  <dd> <p style=margin-top:0>
          844  +    This is another option may also be set to either 1 (true) or 0 (false). 
          845  +    If it is set to false, then the library does not write data into the
          846  +    database log file. This makes writing faster, but also means that if
          847  +    an application crash or power failure occurs, it is very likely that
          848  +    any recently committed transactions will be lost.
          849  +
          850  +    <p>If this option is set to true, then an application crash cannot cause
          851  +    data loss. Whether or not data loss may occur in the event of a power
          852  +    failure depends on the value of the <a href=#data_durability>
          853  +    LSM_CONFIG_SAFETY</a> parameter.
          854  +
          855  +    <p>This option can only be set if the connection does not currently have
          856  +    an open write transaction.
          857  +
          858  +    <p>The default value is 1 (true).
          859  +
          860  +  <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOFLUSH>LSM_CONFIG_AUTOFLUSH</a>
          861  +  <dd> <p style=margin-top:0>
          862  +
          863  +  <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOCHECKPOINT>LSM_CONFIG_AUTOCHECKPOINT</a>
          864  +  <dd> <p style=margin-top:0>
          865  +
          866  +</dl>
          867  +
          868  +<h2 id=using_worker_threads_or_processes>6.2. Using Worker Threads or Processes </h2>
          869  +
          870  +<p><i>Todo: Fix the following </p>
          871  +
          872  +<p>The section above describes the three stages of transfering data written
          873  +to the database from the application to persistent storage. A "writer" 
          874  +client writes the data into the in-memory tree and log file. Later on a 
          875  +"worker" client flushes the data from the in-memory tree to a new segment
          876  +in the the database file. Additionally, a worker client must periodically
          877  +merge existing database segments together to prevent them from growing too
          878  +numerous.
          879  +
          880  +<h3 id=architectural_overview>6.2.1. Architectural Overview </h3>
   774    881   
   775    882   <p> The LSM library implements two separate data structures that are used 
   776    883   together to store user data. When the database is queried, the library 
   777    884   actually runs parallel queries on both of these data stores and merges the
   778    885   results together to return to the user. The data structures are:
   779    886   
   780    887   <ul>
................................................................................
   903   1010   database file header (to checkpoint the database). 
   904   1011   </table>
   905   1012   
   906   1013   <p>The tasks associated with each of the locks above may be performed
   907   1014   concurrently by multiple database connections, located either in the same
   908   1015   application process or different processes.
   909   1016   
   910         -<h2 id=work_and_checkpoint_scheduling>6.2. Work and Checkpoint Scheduling </h2>
   911         -
   912         -<p>The section above describes the three stages of transfering data written
   913         -to the database from the application to persistent storage. A "writer" 
   914         -client writes the data into the in-memory tree and log file. Later on a 
   915         -"worker" client flushes the data from the in-memory tree to a new segment
   916         -in the the database file. Additionally, a worker client must periodically
   917         -merge existing database segments together to prevent them from growing too
   918         -numerous.
   919         -
   920         -<h3 id=automatic_work_and_checkpoint_scheduling>6.2.1. Automatic Work and Checkpoint Scheduling</h3>
         1017  +<h3 id=automatic_work_and_checkpoint_scheduling>6.2.2. Automatic Work and Checkpoint Scheduling</h3>
   921   1018   
   922   1019   <p>By default, database "work" (the flushing and merging of segments, performed
   923   1020   by clients holding the WORKER lock) and checkpointing are scheduled and
   924   1021   performed automatically from within calls to "write" API functions. The 
   925   1022   "write" functions are:
   926   1023   
   927   1024   <ul>
................................................................................
  1014   1111   than zero, after performing database work, the library automatically checks
  1015   1112   how many bytes of raw data have been written to the database file since the
  1016   1113   last checkpoint (by any client, not just by the current client). If this
  1017   1114   value is greater than the value of the LSM_CONFIG_AUTOCHECKPOINT parameter,
  1018   1115   a checkpoint is attempted. It is not an error if the attempt fails because the
  1019   1116   CHECKPOINTER lock cannot be obtained.
  1020   1117   
  1021         -<h3 id=explicit_work_and_checkpoint_scheduling>6.2.2. Explicit Work and Checkpoint Scheduling</h3>
         1118  +<h3 id=explicit_work_and_checkpoint_scheduling>6.2.3. Explicit Work and Checkpoint Scheduling</h3>
  1022   1119   
  1023   1120   <p>The alternative to automatic scheduling of work and checkpoint operations
  1024   1121   is to explicitly schedule them. Possibly in a background thread or dedicated
  1025   1122   application process. In order to disable automatic work, a client must set
  1026   1123   the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of
  1027   1124   a database connection, not of a database itself, so it must be cleared
  1028   1125   separately by all processes that may write to the database. Otherwise, they
................................................................................
  1135   1232     rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive);
  1136   1233   </verbatim>
  1137   1234   
  1138   1235   <verbatim>
  1139   1236     int lsm_flush(lsm_db *db);
  1140   1237   </verbatim>
  1141   1238   
  1142         -<h3 id=compulsary_work_and_checkpoint_scheduling>6.2.3. Compulsary Work and Checkpoint Scheduling</h3>
         1239  +<h3 id=compulsary_work_and_checkpoint_scheduling>6.2.4. Compulsary Work and Checkpoint Scheduling</h3>
  1143   1240   
  1144   1241   <p>Apart from the scenarios described above, there are two there are two 
  1145   1242   scenarios where database work or checkpointing may be performed automatically,
  1146   1243   regardless of the value of the LSM_CONFIG_AUTOWORK parameter.
  1147   1244   
  1148   1245   <ul>
  1149   1246     <li> When closing a database connection, and 
................................................................................
  1180   1277   </ul>
  1181   1278   
  1182   1279   <p>Finally, regardless of age, a database is limited to a maximum of 64
  1183   1280   segments in total. If an attempt is made to flush an in-memory tree to disk
  1184   1281   when the database already contains 64 segments, two or more existing segments
  1185   1282   must be merged together before the new segment can be created.
  1186   1283   
  1187         -<h2 id=database_optimization>6.3. Database Optimization</h2>
         1284  +<h2 id=database_file_optimization>6.3. Database File Optimization</h2>
  1188   1285   
  1189   1286   <p>Database optimization transforms the contents of database file so that
  1190   1287   the following are true:
  1191   1288   
  1192   1289   <ul>
  1193         -  <li> All database content is stored in a single segment.
  1194         -  <li> The database file contains no (or as little as possible) free space.
         1290  +  <li> <p>All database content is stored in a single 
         1291  +       <a href=#architectural_overview>segment</a>. This makes the
         1292  +       database effectively equivalent to an optimally packed b-tree stucture
         1293  +       for search operations - minimizing the number of disk sectors that need
         1294  +       to be visted when searching the database.
         1295  +
         1296  +  <li> <p>The database file contains no (or as little as possible) free space.
  1195   1297          In other words, it is no larger than required to contain the single
  1196   1298          segment.
  1197   1299   </ul>
         1300  +
         1301  +<p><i> Should we add a convenience function lsm_optimize() that does not 
         1302  +return until the database is completely optimized? One that more or less does
         1303  +the same as the example code below and deals with the AUTOCHECKPOINT issue?
         1304  +This would help with this user manual if nothing else, as it means a method
         1305  +for database optimization can be presented without depending on the previous
         1306  +section.
         1307  +
         1308  +</i>
  1198   1309   
  1199   1310   <p>In order to optimize the database, lsm_work() should be called repeatedly
  1200   1311   with the nMerge argument set to 1 until it returns without writing any data
  1201   1312   to the database file. For example:
  1202   1313   
  1203   1314   <verbatim>
  1204   1315     int nWrite;
  1205   1316     int rc;
  1206   1317     do {
  1207   1318       rc = lsm_work(db, 1, 2*1024*1024, &nWrite);
  1208   1319     }while( rc==LSM_OK && nWrite>0 );
  1209   1320   </verbatim>
  1210   1321   
  1211         -<p>When optimizing the database as above, the LSM_CONFIG_AUTOCHECKPOINT
  1212         -parameter should be set to a non-zero value, or otherwise lsm_checkpoint()
  1213         -should be called periodically. Otherwise, no checkpoints will be performed,
  1214         -preventing the library from reusing any space occupied by old segments even
  1215         -after their content has been merged into the new segment. The result - a
  1216         -database file that is optimized, except that it is up to twice as large as
  1217         -it otherwise would be.
  1218         -
  1219         -<h2 id=other_parameters>6.4. Other Parameters </h2>
  1220         -
  1221         -<i>
  1222         -<p>Mention other configuration options that can be used to tune performance
  1223         -here.
  1224         -
  1225         -<ul>
  1226         -  <li> LSM_CONFIG_MMAP
  1227         -  <li> LSM_CONFIG_MULTIPLE_PROCESSES
  1228         -  <li> LSM_CONFIG_USE_LOG
  1229         -</ul>
  1230         -
  1231         -</i>
  1232         -
  1233         -
  1234         -
         1322  +<p>When optimizing the database as above, either the LSM_CONFIG_AUTOCHECKPOINT
         1323  +parameter should be set to a non-zero value or lsm_checkpoint() should be
         1324  +called periodically. Otherwise, no checkpoints will be performed, preventing
         1325  +the library from reusing any space occupied by old segments even after their
         1326  +content has been merged into the new segment. The result - a database file that
         1327  +is optimized, except that it is up to twice as large as it otherwise would be.
  1235   1328   
  1236   1329