Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add words to lsmusr.wiki.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 2077c9d152c6f593397b314b90945dc1557c55d6
User & Date: dan 2012-11-15 14:19:56.150
Context
2012-11-15
18:45
Update the lsm code so that it matches lsmusr.wiki. check-in: 8915d39dab user: dan tags: trunk
14:19
Add words to lsmusr.wiki. check-in: 2077c9d152 user: dan tags: trunk
2012-11-14
20:09
Updates to lsmusr.wiki. check-in: 1ea9187820 user: dan tags: trunk
Changes
Unified Diff Ignore Whitespace Patch
Changes to www/lsmusr.wiki.
855
856
857
858
859
860
861






862
863
864




865

























866
867



868
869





870
871
872


873
874



875


876


877


878





879
880
881
882
883
884
885
    <p>This option can only be set if the connection does not currently have
    an open write transaction.

    <p>The default value is 1 (true).

  <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOFLUSH>LSM_CONFIG_AUTOFLUSH</a>
  <dd> <p style=margin-top:0>







  <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOCHECKPOINT>LSM_CONFIG_AUTOCHECKPOINT</a>
  <dd> <p style=margin-top:0>






























</dl>




<h2 id=using_worker_threads_or_processes>6.2. Using Worker Threads or Processes </h2>






<p><i>Todo: Fix the following </p>

<p>The section above describes the three stages of transfering data written


to the database from the application to persistent storage. A "writer" 
client writes the data into the in-memory tree and log file. Later on a 



"worker" client flushes the data from the in-memory tree to a new segment


in the the database file. Additionally, a worker client must periodically


merge existing database segments together to prevent them from growing too


numerous.






<h3 id=architectural_overview>6.2.1. Architectural Overview </h3>

<p> The LSM library implements two separate data structures that are used 
together to store user data. When the database is queried, the library 
actually runs parallel queries on both of these data stores and merges the
results together to return to the user. The data structures are:







>
>
>
>
>
>



>
>
>
>

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


>
>
>


>
>
>
>
>
|

<
>
>
|
|
>
>
>
|
>
>
|
>
>
|
>
>
|
>
>
>
>
>







855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914

915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
    <p>This option can only be set if the connection does not currently have
    an open write transaction.

    <p>The default value is 1 (true).

  <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOFLUSH>LSM_CONFIG_AUTOFLUSH</a>
  <dd> <p style=margin-top:0>
    When a client writes to an LSM database, changes are buffered in memory 
    before being written out to the database file in a batch. This option
    is set to the size of the buffer in bytes. The default value is 1048576.
    Increasing this value may improve overall write throughput. Decreasing
    it reduces memory usage.
    

  <dt> <a href=lsmapi.wiki#LSM_CONFIG_AUTOCHECKPOINT>LSM_CONFIG_AUTOCHECKPOINT</a>
  <dd> <p style=margin-top:0>
    This option determines how often the database is checkpointed (synced to
    disk). A checkpoint is performed after each N bytes (approximately) are
    written to the database file, where N is the value of this option. The
    default value is 2097152 (2MB).

    <p>
    Increasing this value (say to 4MB or even 8MB) may improve overall write
    throughput. However, it is important not to checkpoint too infrequently, 
    as:
    <ul>
      <li> <p>
           Space in the log file may only be reused following a checkpoint.
           Once a checkpoint has been completed, all application data written
           before the checkpoint is safely stored in the database file. This
           means the contents of the log file are no longer required, and so
           the next database writer may start writing a new log into the
           start of the existing file (overwriting the old data). 
           As well as consuming disk space, large log files are undesirable 
           as often the entire log file must be read during database recovery
           following an application or system failure.
           
      <li> <p>
           Similarly, space in the database file freed by the continuous
           incremental reorganization of the database file that LSM performs
           cannot be reused until after a checkpoint has been performed. 
           So increasing the value of this parameter causes the space used
           by the pre-reorganization versions of data structures within the
           file to be recycled more slowly. Increasing the overall size of
           database files under some circumstances.
    </ul>
</dl>

<p>The roles of the LSM_CONFIG_AUTOFLUSH and LSM_CONFIG_AUTOCHECKPOINT options 
are discussed in more detail in the following section.

<h2 id=using_worker_threads_or_processes>6.2. Using Worker Threads or Processes </h2>

<p>Usually, the database file is updated on disk from within calls to
write functions - lsm_insert(), lsm_delete(), lsm_delete_range() and
lsm_commit(). This means that occasionally, a call to one of these four 
functions may take signicantly longer than usual as it pauses to write or
sync the database file. <a href=#automatic_work_and_checkpoint_scheduling>See
below</a> for details.


<p>The alternative to updating the database file from within calls to write
functions is to use one or more background threads or processes to perform the
actual work of writing to the database file. 

<ul>
  <li> Have all client connections set the LSM_CONFIG_AUTOWORK option to 0.
       This stops them from writing to the database file.

  <li> Arrange for a background thread or process to connect to the database
       and call the <a href=lsmapi.wiki#lsm_work>lsm_work()</a> function 
       periodically.
       below</a>).
</ul>

<p>Further explanation of, and example code for, the above is
<a href=#explicit_work_and_checkpoint_scheduling>available below</a>.

<p>The following sub-sections provide a high-level description of the
LSM database architecture and descriptions of how various parameters 
affect the way clients write to the database file. While this may seem
in some ways an inconvenient amount of detail, it may also be helpful
when diagnosing performance problems or analyzing benchmark results.

<h3 id=architectural_overview>6.2.1. Architectural Overview </h3>

<p> The LSM library implements two separate data structures that are used 
together to store user data. When the database is queried, the library 
actually runs parallel queries on both of these data stores and merges the
results together to return to the user. The data structures are:
1121
1122
1123
1124
1125
1126
1127






1128
1129
1130
1131
1132
1133
1134
is to explicitly schedule them. Possibly in a background thread or dedicated
application process. In order to disable automatic work, a client must set
the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of
a database connection, not of a database itself, so it must be cleared
separately by all processes that may write to the database. Otherwise, they
may attempt automatic database work or checkpoints.







<p>The lsm_work() function is used to explicitly perform work on the database:

<verbatim>
  int lsm_work(lsm_db *db, int nMerge, int nByte, int *pnWrite);
</verbatim>

<p>Parameter nByte is passed a limit on the number of bytes of data that 







>
>
>
>
>
>







1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
is to explicitly schedule them. Possibly in a background thread or dedicated
application process. In order to disable automatic work, a client must set
the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of
a database connection, not of a database itself, so it must be cleared
separately by all processes that may write to the database. Otherwise, they
may attempt automatic database work or checkpoints.

<verbatim>
  /* Disable auto-work on connection db */
  int iVal = 0;
  lsm_config(db, LSM_CONFIG_AUTOWORK, &iVal);
</verbatim>

<p>The lsm_work() function is used to explicitly perform work on the database:

<verbatim>
  int lsm_work(lsm_db *db, int nMerge, int nByte, int *pnWrite);
</verbatim>

<p>Parameter nByte is passed a limit on the number of bytes of data that 
1216
1217
1218
1219
1220
1221
1222

1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
checkpoint (implying that the snapshot has not changed and there is no need
to write it into the database file), lsm_checkpoint() sets *pnCkpt to zero
and returns immediately. Otherwise, it checkpoints the database and sets
*pnCkpt to the number of bytes written to the database file since the
previous checkpoint.

<p>The number of bytes written to the database since the most recent checkpoint

can also be using the lsm_info() API function. As follows:

<verbatim>
  int nCkpt;
  rc = lsm_info(db, LSM_INFO_CHECKPOINT_SIZE, &nCkpt);
</verbatim>

<verbatim>
  int nOld, nLive;
  rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive);
</verbatim>

<verbatim>
  int lsm_flush(lsm_db *db);
</verbatim>

<h3 id=compulsary_work_and_checkpoint_scheduling>6.2.4. Compulsary Work and Checkpoint Scheduling</h3>

<p>Apart from the scenarios described above, there are two there are two 
scenarios where database work or checkpointing may be performed automatically,
regardless of the value of the LSM_CONFIG_AUTOWORK parameter.

<ul>







>
|











<
<
<
<







1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299




1300
1301
1302
1303
1304
1305
1306
checkpoint (implying that the snapshot has not changed and there is no need
to write it into the database file), lsm_checkpoint() sets *pnCkpt to zero
and returns immediately. Otherwise, it checkpoints the database and sets
*pnCkpt to the number of bytes written to the database file since the
previous checkpoint.

<p>The number of bytes written to the database since the most recent checkpoint
can also be using the <a href=lsmapi.wiki#lsm_info>lsm_info()</a> API 
function. As follows:

<verbatim>
  int nCkpt;
  rc = lsm_info(db, LSM_INFO_CHECKPOINT_SIZE, &nCkpt);
</verbatim>

<verbatim>
  int nOld, nLive;
  rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive);
</verbatim>





<h3 id=compulsary_work_and_checkpoint_scheduling>6.2.4. Compulsary Work and Checkpoint Scheduling</h3>

<p>Apart from the scenarios described above, there are two there are two 
scenarios where database work or checkpointing may be performed automatically,
regardless of the value of the LSM_CONFIG_AUTOWORK parameter.

<ul>