Index: www/lsmusr.wiki ================================================================== --- www/lsmusr.wiki +++ www/lsmusr.wiki @@ -4,10 +4,11 @@

Table of Contents

+
      1. Introduction to LSM
      2. Using LSM in Applications
      3. Basic Usage
@@ -16,25 +17,25 @@             3.3. Reading from a Database
            3.4. Database Transactions and MVCC
      4. Data Durability
      5. Compressed and Encrypted Databases
      6. Performance Tuning
-            6.1. Architectural Overview
-            6.2. Work and Checkpoint Scheduling
-                  6.2.1. Automatic Work and Checkpoint Scheduling
-                  6.2.2. Explicit Work and Checkpoint Scheduling
-                  6.2.3. Compulsary Work and Checkpoint Scheduling
-            6.3. Database Optimization
-            6.4. Other Parameters
+            6.1. Performance Related Configuration Options
+            6.2. Using Worker Threads or Processes
+                  6.2.1. Architectural Overview
+                  6.2.2. Automatic Work and Checkpoint Scheduling
+                  6.2.3. Explicit Work and Checkpoint Scheduling
+                  6.2.4. Compulsary Work and Checkpoint Scheduling
+            6.3. Database File Optimization

Overview

This document describes the LSM embedded database library and use thereof. It is intended to be part user-manual and part tutorial. It is intended to -to complement the LSM API reference manual. +complement the LSM API reference manual.

The first section of this document contains a description of the LSM library and its features. Section 2 describes how to use LSM from within a C or C++ application (how to compile and link LSM, what to #include @@ -768,11 +769,117 @@

6. Performance Tuning

-

6.1. Architectural Overview

+

This section describes the various measures that can be taken in order to +fine-tune LSM in order to improve performance in specific circumstances. +Sub-section 6.1 identifies the + configuration +parameters that can be used to influence database performance. +Sub-section 6.2 discusses methods for shifting the time-consuming processes of +actually writing and syncing the database file to +background threads or processes +in order to make writing to the database more responsive. Finally, 6. +3 introduces "database optimization" +- the process of reorganizing a database file internally so that it is as small +as possible and optimized for search queries. + +

+ +

The options in this section all take integer values. They may be both +set and queried using the lsm_config() +function. To set an option to a value, lsm_config() is used as follows: + + + /* Set the LSM_CONFIG_AUTOFLUSH option to 1MB */ + int iVal = 1 * 1024 * 1024; + rc = lsm_config(db, LSM_CONFIG_AUTOFLUSH, &iVal); + + +

In order to query the current value of an option, the initial value of +the parameter (iVal in the example code above) should be set to a negative +value. Or any other value that happens to be out of range for the parameter - +negative values just happen to be out of range for all integer lsm_config() +parameters. + + + /* Set iVal to the current value of LSM_CONFIG_AUTOFLUSH */ + int iVal = -1; + rc = lsm_config(db, LSM_CONFIG_AUTOFLUSH, &iVal); + + +

+
LSM_CONFIG_MMAP +

+ This option may be set to either 1 (true) or 0 (false). If it is set to + true and LSM is running on a system with a 64-bit address space, the + entire database file is memory mapped. Or, if it is false or LSM is + running in a 32-bit address space, data is accessed using ordinary + OS file read and write primitives. Memory mapping the database file + can significantly improve the performance of read operations, as database + pages do not have to be copied from operating system buffers into user + space buffers before they can be examined. + +

This option can only be set before lsm_open() is called on the database + connection. + +

The default value is 1 (true). + +

LSM_CONFIG_MULTIPLE_PROCESSES +

+ This option may also be set to either 1 (true) or 0 (false). If it is + set to 0, then the library assumes that all database clients are located + within the same process (have access to the same memory space). Assuming + this means the library can avoid using OS file locking primitives to lock + the database file, which speeds up opening and closing read and write + transactions. + +

This option can only be set before lsm_open() is called on the database + connection. + +

The default value is 1 (true). + +

LSM_CONFIG_USE_LOG +

+ This is another option may also be set to either 1 (true) or 0 (false). + If it is set to false, then the library does not write data into the + database log file. This makes writing faster, but also means that if + an application crash or power failure occurs, it is very likely that + any recently committed transactions will be lost. + +

If this option is set to true, then an application crash cannot cause + data loss. Whether or not data loss may occur in the event of a power + failure depends on the value of the + LSM_CONFIG_SAFETY parameter. + +

This option can only be set if the connection does not currently have + an open write transaction. + +

The default value is 1 (true). + +

LSM_CONFIG_AUTOFLUSH +

+ +

LSM_CONFIG_AUTOCHECKPOINT +

+ +

+ +

6.2. Using Worker Threads or Processes

+ +

Todo: Fix the following

+ +

The section above describes the three stages of transfering data written +to the database from the application to persistent storage. A "writer" +client writes the data into the in-memory tree and log file. Later on a +"worker" client flushes the data from the in-memory tree to a new segment +in the the database file. Additionally, a worker client must periodically +merge existing database segments together to prevent them from growing too +numerous. + +

6.2.1. Architectural Overview

The LSM library implements two separate data structures that are used together to store user data. When the database is queried, the library actually runs parallel queries on both of these data stores and merges the results together to return to the user. The data structures are: @@ -905,21 +1012,11 @@

The tasks associated with each of the locks above may be performed concurrently by multiple database connections, located either in the same application process or different processes. -

6.2. Work and Checkpoint Scheduling

- -

The section above describes the three stages of transfering data written -to the database from the application to persistent storage. A "writer" -client writes the data into the in-memory tree and log file. Later on a -"worker" client flushes the data from the in-memory tree to a new segment -in the the database file. Additionally, a worker client must periodically -merge existing database segments together to prevent them from growing too -numerous. - -

6.2.1. Automatic Work and Checkpoint Scheduling

+

6.2.2. Automatic Work and Checkpoint Scheduling

By default, database "work" (the flushing and merging of segments, performed by clients holding the WORKER lock) and checkpointing are scheduled and performed automatically from within calls to "write" API functions. The "write" functions are: @@ -1016,11 +1113,11 @@ last checkpoint (by any client, not just by the current client). If this value is greater than the value of the LSM_CONFIG_AUTOCHECKPOINT parameter, a checkpoint is attempted. It is not an error if the attempt fails because the CHECKPOINTER lock cannot be obtained. -

6.2.2. Explicit Work and Checkpoint Scheduling

+

6.2.3. Explicit Work and Checkpoint Scheduling

The alternative to automatic scheduling of work and checkpoint operations is to explicitly schedule them. Possibly in a background thread or dedicated application process. In order to disable automatic work, a client must set the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of @@ -1137,11 +1234,11 @@ int lsm_flush(lsm_db *db); -

6.2.3. Compulsary Work and Checkpoint Scheduling

+

6.2.4. Compulsary Work and Checkpoint Scheduling

Apart from the scenarios described above, there are two there are two scenarios where database work or checkpointing may be performed automatically, regardless of the value of the LSM_CONFIG_AUTOWORK parameter. @@ -1182,21 +1279,35 @@

Finally, regardless of age, a database is limited to a maximum of 64 segments in total. If an attempt is made to flush an in-memory tree to disk when the database already contains 64 segments, two or more existing segments must be merged together before the new segment can be created. -

6.3. Database Optimization

+

6.3. Database File Optimization

Database optimization transforms the contents of database file so that the following are true:

+ +

Should we add a convenience function lsm_optimize() that does not +return until the database is completely optimized? One that more or less does +the same as the example code below and deals with the AUTOCHECKPOINT issue? +This would help with this user manual if nothing else, as it means a method +for database optimization can be presented without depending on the previous +section. + +

In order to optimize the database, lsm_work() should be called repeatedly with the nMerge argument set to 1 until it returns without writing any data to the database file. For example: @@ -1206,31 +1317,13 @@ do { rc = lsm_work(db, 1, 2*1024*1024, &nWrite); }while( rc==LSM_OK && nWrite>0 ); -

When optimizing the database as above, the LSM_CONFIG_AUTOCHECKPOINT -parameter should be set to a non-zero value, or otherwise lsm_checkpoint() -should be called periodically. Otherwise, no checkpoints will be performed, -preventing the library from reusing any space occupied by old segments even -after their content has been merged into the new segment. The result - a -database file that is optimized, except that it is up to twice as large as -it otherwise would be. - -

6.4. Other Parameters

- - -

Mention other configuration options that can be used to tune performance -here. - -

- -
- - - +

When optimizing the database as above, either the LSM_CONFIG_AUTOCHECKPOINT +parameter should be set to a non-zero value or lsm_checkpoint() should be +called periodically. Otherwise, no checkpoints will be performed, preventing +the library from reusing any space occupied by old segments even after their +content has been merged into the new segment. The result - a database file that +is optimized, except that it is up to twice as large as it otherwise would be.