Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Updates to lsmusr.wiki. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
1ea91878201f0998b35a6843da7c9e4c |
User & Date: | dan 2012-11-14 20:09:24.942 |
Context
2012-11-15
| ||
14:19 | Add words to lsmusr.wiki. check-in: 2077c9d152 user: dan tags: trunk | |
2012-11-14
| ||
20:09 | Updates to lsmusr.wiki. check-in: 1ea9187820 user: dan tags: trunk | |
18:23 | Improvements to lsmusr.wiki. check-in: e47b5e3ae6 user: dan tags: trunk | |
Changes
Changes to www/lsmusr.wiki.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | <title>LSM Users Guide</title> <nowiki> <h2>Table of Contents</h2> <div id=start_of_toc></div> <a href=#introduction_to_lsm style=text-decoration:none>1. Introduction to LSM</a><br> <a href=#using_lsm_in_applications style=text-decoration:none>2. Using LSM in Applications </a><br> <a href=#basic_usage style=text-decoration:none>3. Basic Usage</a><br> <a href=#opening_and_closing_database_connections style=text-decoration:none>3.1. Opening and Closing Database Connections </a><br> <a href=#writing_to_a_database style=text-decoration:none>3.2. Writing to a Database </a><br> <a href=#reading_from_a_database style=text-decoration:none>3.3. Reading from a Database </a><br> <a href=#database_transactions_and_mvcc style=text-decoration:none>3.4. Database Transactions and MVCC </a><br> <a href=#data_durability style=text-decoration:none>4. Data Durability </a><br> <a href=#compressed_and_encrypted_databases style=text-decoration:none>5. Compressed and Encrypted Databases </a><br> <a href=#performance_tuning style=text-decoration:none>6. Performance Tuning</a><br> | > | | > | | | | < | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | <title>LSM Users Guide</title> <nowiki> <h2>Table of Contents</h2> <div id=start_of_toc></div> <a href=#introduction_to_lsm style=text-decoration:none>1. Introduction to LSM</a><br> <a href=#using_lsm_in_applications style=text-decoration:none>2. Using LSM in Applications </a><br> <a href=#basic_usage style=text-decoration:none>3. Basic Usage</a><br> <a href=#opening_and_closing_database_connections style=text-decoration:none>3.1. Opening and Closing Database Connections </a><br> <a href=#writing_to_a_database style=text-decoration:none>3.2. Writing to a Database </a><br> <a href=#reading_from_a_database style=text-decoration:none>3.3. Reading from a Database </a><br> <a href=#database_transactions_and_mvcc style=text-decoration:none>3.4. Database Transactions and MVCC </a><br> <a href=#data_durability style=text-decoration:none>4. Data Durability </a><br> <a href=#compressed_and_encrypted_databases style=text-decoration:none>5. Compressed and Encrypted Databases </a><br> <a href=#performance_tuning style=text-decoration:none>6. Performance Tuning</a><br> <a href=#performance_related_configuration_options style=text-decoration:none>6.1. Performance Related Configuration Options </a><br> <a href=#using_worker_threads_or_processes style=text-decoration:none>6.2. Using Worker Threads or Processes </a><br> <a href=#architectural_overview style=text-decoration:none>6.2.1. Architectural Overview </a><br> <a href=#automatic_work_and_checkpoint_scheduling style=text-decoration:none>6.2.2. Automatic Work and Checkpoint Scheduling</a><br> <a href=#explicit_work_and_checkpoint_scheduling style=text-decoration:none>6.2.3. Explicit Work and Checkpoint Scheduling</a><br> <a href=#compulsary_work_and_checkpoint_scheduling style=text-decoration:none>6.2.4. Compulsary Work and Checkpoint Scheduling</a><br> <a href=#database_file_optimization style=text-decoration:none>6.3. Database File Optimization</a><br> <div id=end_of_toc></div> <h2>Overview</h2> <p>This document describes the LSM embedded database library and use thereof. It is intended to be part user-manual and part tutorial. It is intended to complement the <a href=lsmapi.wiki>LSM API reference manual</a>. <p>The <a href=#introduction_to_lsm>first section</a> of this document contains a description of the LSM library and its features. <a href=#using_lsm_in_applications>Section 2</a> describes how to use LSM from within a C or C++ application (how to compile and link LSM, what to #include etc.). The <a href=#basic_usage>third section</a> describes the essential APIs that applications use to open and close database connections, and to read from |
︙ | ︙ | |||
766 767 768 769 770 771 772 | <p><i>Maybe there should be a way to register a mismatch-handler callback. Otherwise, applications have to handle LSM_MISMATCH everywhere... </i> <h1 id=performance_tuning>6. Performance Tuning</h1> | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 | <p><i>Maybe there should be a way to register a mismatch-handler callback. Otherwise, applications have to handle LSM_MISMATCH everywhere... </i> <h1 id=performance_tuning>6. Performance Tuning</h1> <p> This section describes the various measures that can be taken in order to fine-tune LSM in order to improve performance in specific circumstances. Sub-section 6.1 identifies the <a href=#performance_related_configuration_options> configuration parameters</a> that can be used to influence database performance. Sub-section 6.2 discusses methods for shifting the time-consuming processes of actually writing and syncing the database file to <a href=#using_worker_threads_or_processes>background threads or processes</a> in order to make writing to the database more responsive. Finally, 6. 3 introduces "<a href=#database_file_optimization>database optimization</a>" - the process of reorganizing a database file internally so that it is as small as possible and optimized for search queries. <h2 id=performance_related_configuration_options>6.1. Performance Related Configuration Options </h2> <p>The options in this section all take integer values. They may be both set and queried using the <a href=lsmapi.wiki#lsm_config>lsm_config()</a> function. To set an option to a value, lsm_config() is used as follows: <verbatim> /* Set the LSM_CONFIG_AUTOFLUSH option to 1MB */ int iVal = 1 * 1024 * 1024; rc = lsm_config(db, LSM_CONFIG_AUTOFLUSH, &iVal); </verbatim> <p>In order to query the current value of an option, the initial value of the parameter (iVal in the example code above) should be set to a negative value. Or any other value that happens to be out of range for the parameter - negative values just happen to be out of range for all integer lsm_config() parameters. <verbatim> /* Set iVal to the current value of LSM_CONFIG_AUTOFLUSH */ int iVal = -1; rc = lsm_config(db, LSM_CONFIG_AUTOFLUSH, &iVal); </verbatim> <dl> <dt> <a href=lsmapi.wiki#LSM_CONFIG_MMAP>LSM_CONFIG_MMAP</a> <dd> <p style=margin-top:0> This option may be set to either 1 (true) or 0 (false). If it is set to true and LSM is running on a system with a 64-bit address space, the entire database file is memory mapped. Or, if it is false or LSM is running in a 32-bit address space, data is accessed using ordinary OS file read and write primitives. Memory mapping the database file can significantly improve the performance of read operations, as database pages do not have to be copied from operating system buffers into user space buffers before they can be examined. <p>This option can only be set before lsm_open() is called on the database connection. <p>The default value is 1 (true). <dt> <a href=lsmapi.wiki#LSM_CONFIG_MULTIPLE_PROCESSES>LSM_CONFIG_MULTIPLE_PROCESSES</a> <dd> <p style=margin-top:0> This option may also be set to either 1 (true) or 0 (false). If it is set to 0, then the library assumes that all database clients are located within the same process (have access to the same memory space). Assuming this means the library can avoid using OS file locking primitives to lock the database file, which speeds up opening and closing read and write transactions. <p>This option can only be set before lsm_open() is called on the database connection. <p>The default value is 1 (true). <dt> <a href=lsmapi.wiki#LSM_CONFIG_USE_LOG>LSM_CONFIG_USE_LOG</a> <dd> <p style=margin-top:0> This is another option may also be set to either 1 (true) or 0 (false). If it is set to false, then the library does not write data into the database log file. This makes writing faster, but also means that if an application crash or power failure occurs, it is very likely that any recently committed transactions will be lost. <p>If this option is set to true, then an application crash cannot cause data loss. Whether or not data loss may occur in the event of a power failure depends on the value of the <a href=#data_durability> LSM_CONFIG_SAFETY</a> parameter. <p>This option can only be set if the connection does not currently have an open write transaction. <p>The default value is 1 (true). <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOFLUSH>LSM_CONFIG_AUTOFLUSH</a> <dd> <p style=margin-top:0> <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOCHECKPOINT>LSM_CONFIG_AUTOCHECKPOINT</a> <dd> <p style=margin-top:0> </dl> <h2 id=using_worker_threads_or_processes>6.2. Using Worker Threads or Processes </h2> <p><i>Todo: Fix the following </p> <p>The section above describes the three stages of transfering data written to the database from the application to persistent storage. A "writer" client writes the data into the in-memory tree and log file. Later on a "worker" client flushes the data from the in-memory tree to a new segment in the the database file. Additionally, a worker client must periodically merge existing database segments together to prevent them from growing too numerous. <h3 id=architectural_overview>6.2.1. Architectural Overview </h3> <p> The LSM library implements two separate data structures that are used together to store user data. When the database is queried, the library actually runs parallel queries on both of these data stores and merges the results together to return to the user. The data structures are: <ul> |
︙ | ︙ | |||
903 904 905 906 907 908 909 | database file header (to checkpoint the database). </table> <p>The tasks associated with each of the locks above may be performed concurrently by multiple database connections, located either in the same application process or different processes. | < < < < < < < < < < | | 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 | database file header (to checkpoint the database). </table> <p>The tasks associated with each of the locks above may be performed concurrently by multiple database connections, located either in the same application process or different processes. <h3 id=automatic_work_and_checkpoint_scheduling>6.2.2. Automatic Work and Checkpoint Scheduling</h3> <p>By default, database "work" (the flushing and merging of segments, performed by clients holding the WORKER lock) and checkpointing are scheduled and performed automatically from within calls to "write" API functions. The "write" functions are: <ul> |
︙ | ︙ | |||
1014 1015 1016 1017 1018 1019 1020 | than zero, after performing database work, the library automatically checks how many bytes of raw data have been written to the database file since the last checkpoint (by any client, not just by the current client). If this value is greater than the value of the LSM_CONFIG_AUTOCHECKPOINT parameter, a checkpoint is attempted. It is not an error if the attempt fails because the CHECKPOINTER lock cannot be obtained. | | | 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 | than zero, after performing database work, the library automatically checks how many bytes of raw data have been written to the database file since the last checkpoint (by any client, not just by the current client). If this value is greater than the value of the LSM_CONFIG_AUTOCHECKPOINT parameter, a checkpoint is attempted. It is not an error if the attempt fails because the CHECKPOINTER lock cannot be obtained. <h3 id=explicit_work_and_checkpoint_scheduling>6.2.3. Explicit Work and Checkpoint Scheduling</h3> <p>The alternative to automatic scheduling of work and checkpoint operations is to explicitly schedule them. Possibly in a background thread or dedicated application process. In order to disable automatic work, a client must set the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of a database connection, not of a database itself, so it must be cleared separately by all processes that may write to the database. Otherwise, they |
︙ | ︙ | |||
1135 1136 1137 1138 1139 1140 1141 | rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive); </verbatim> <verbatim> int lsm_flush(lsm_db *db); </verbatim> | | | 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 | rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive); </verbatim> <verbatim> int lsm_flush(lsm_db *db); </verbatim> <h3 id=compulsary_work_and_checkpoint_scheduling>6.2.4. Compulsary Work and Checkpoint Scheduling</h3> <p>Apart from the scenarios described above, there are two there are two scenarios where database work or checkpointing may be performed automatically, regardless of the value of the LSM_CONFIG_AUTOWORK parameter. <ul> <li> When closing a database connection, and |
︙ | ︙ | |||
1180 1181 1182 1183 1184 1185 1186 | </ul> <p>Finally, regardless of age, a database is limited to a maximum of 64 segments in total. If an attempt is made to flush an in-memory tree to disk when the database already contains 64 segments, two or more existing segments must be merged together before the new segment can be created. | | | > > > > > | > > > > > > > > > | | | | | | < < < < < < < < < < < < < < < < < < | 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 | </ul> <p>Finally, regardless of age, a database is limited to a maximum of 64 segments in total. If an attempt is made to flush an in-memory tree to disk when the database already contains 64 segments, two or more existing segments must be merged together before the new segment can be created. <h2 id=database_file_optimization>6.3. Database File Optimization</h2> <p>Database optimization transforms the contents of database file so that the following are true: <ul> <li> <p>All database content is stored in a single <a href=#architectural_overview>segment</a>. This makes the database effectively equivalent to an optimally packed b-tree stucture for search operations - minimizing the number of disk sectors that need to be visted when searching the database. <li> <p>The database file contains no (or as little as possible) free space. In other words, it is no larger than required to contain the single segment. </ul> <p><i> Should we add a convenience function lsm_optimize() that does not return until the database is completely optimized? One that more or less does the same as the example code below and deals with the AUTOCHECKPOINT issue? This would help with this user manual if nothing else, as it means a method for database optimization can be presented without depending on the previous section. </i> <p>In order to optimize the database, lsm_work() should be called repeatedly with the nMerge argument set to 1 until it returns without writing any data to the database file. For example: <verbatim> int nWrite; int rc; do { rc = lsm_work(db, 1, 2*1024*1024, &nWrite); }while( rc==LSM_OK && nWrite>0 ); </verbatim> <p>When optimizing the database as above, either the LSM_CONFIG_AUTOCHECKPOINT parameter should be set to a non-zero value or lsm_checkpoint() should be called periodically. Otherwise, no checkpoints will be performed, preventing the library from reusing any space occupied by old segments even after their content has been merged into the new segment. The result - a database file that is optimized, except that it is up to twice as large as it otherwise would be. |