Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Updates to lsmusr.wiki. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
f7ef6cec1fce03ae796f92f97a2a6dae |
User & Date: | dan 2012-11-12 19:41:02.134 |
Context
2012-11-12
| ||
20:19 | Fix small issues in lsmusr.wiki. check-in: 3904797435 user: dan tags: trunk | |
19:41 | Updates to lsmusr.wiki. check-in: f7ef6cec1f user: dan tags: trunk | |
2012-11-09
| ||
20:14 | Minor changes to lsmusr.wiki. Add the lsm_csr_cmp() function. check-in: 9d39c3a354 user: dan tags: trunk | |
Changes
Changes to www/lsmusr.wiki.
︙ | ︙ | |||
233 234 235 236 237 238 239 | write-transaction is opened, then the snapshot read by the read-transaction must correspond to the most recent version of the database. Otherwise, the attempt to open the write-transaction fails (LSM_BUSY). In other words, if any other client has written to the database since the current clients read-transaction was opened, it will not be possible to upgrade to a write-transaction. | | | 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | write-transaction is opened, then the snapshot read by the read-transaction must correspond to the most recent version of the database. Otherwise, the attempt to open the write-transaction fails (LSM_BUSY). In other words, if any other client has written to the database since the current clients read-transaction was opened, it will not be possible to upgrade to a write-transaction. <p>Write-transactions may be opened either implicitly or explicitly. If any of the following functions are called to write to the database when there is no write-transaction open, then an implicit write-transaction is opened and close (committed) within the function: <ul> <li> lsm_insert() <li> lsm_delete() |
︙ | ︙ | |||
509 510 511 512 513 514 515 516 | to the database from the application to persistent storage. A "writer" client writes the data into the in-memory tree and log file. Later on a "worker" client flushes the data from the in-memory tree to a new segment in the the database file. Additionally, a worker client must periodically merge existing database segments together to prevent them from growing too numerous. <ul> | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | > > > | > > > > > > > | > > > > > > > > | > > > | > | > | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 | to the database from the application to persistent storage. A "writer" client writes the data into the in-memory tree and log file. Later on a "worker" client flushes the data from the in-memory tree to a new segment in the the database file. Additionally, a worker client must periodically merge existing database segments together to prevent them from growing too numerous. <h3>3.2.1 Automatic Work and Checkpoint Scheduling</h3> <p>By default, database "work" (the flushing and merging of segments, performed by clients holding the WORKER lock) and checkpointing are scheduled and performed automatically from within calls to "write" API functions. The "write" functions are: <ul> <li>lsm_insert() <li>lsm_delete() <li>lsm_delete_range() <li>lsm_commit() </ul> <p>It is expected that automatic work and checkpoint scheduling will be suitable for most applications. The advantage of this model is that it is simple to use. However, any call to one of the functions listed above may be co-opted by the system to perform database work or a checkpoint, causing it to return more slowly than it otherwise would. In some situations, for example when a writer thread is also responsible for handling user-interface events, this may be undesirable. <p>Automatic work and checkpoint scheduling is controlled by four integer parameters set or queried using the lsm_config() interface. The parameters are: <dl> <dt> LSM_CONFIG_AUTOWORK <dd> <p style=margin-top:0> This is a boolean parameter (default 1). If set, auto-work mode is enabled for the database connection. Otherwise, it is not. <dt> LSM_CONFIG_AUTOFLUSH <dd> <p style=margin-top:0> An integer parameter (default 1048576). If this parameter is set to a non-zero value, then after each transaction is committed, the library checks the total size of the live in-memory tree. If it is equal to or greater than the configured value of this parameter in bytes and there is no existing old in-memory tree, then the current live tree is marked as old. <p>Additionally, if the LSM_CONFIG_AUTOWORK parameter is set, the contents of the in-memory tree are immediately flushed to disk. <dt> LSM_CONFIG_AUTOMERGE <dd> <p style=margin-top:0> This parameter must be set to an integer value between 2 and 8, inclusive. It controls the number of existing segments that auto-work attempts to merge together at a time. The default value is 4. <dt> LSM_CONFIG_AUTOCHECKPOINT <dd> <p style=margin-top:0> If this parameter is set to an integer value greater than 0, then a checkpoint is automatically attempted after this many bytes are written into the database file. The default value is 2097152. </dl> <p>Each segment in the database file is assigned an "age" - an integer zero or greater indicating how many times the data in the segment has been merged. A segment created by flushing the in-memory tree to disk is assigned an age of 1. When two or more segments with age=1 are merged together to create a larger segment, it is assigned an age of 2. And so on. <p>If auto-work is enabled, the library periodically checks the state of the database file to see if there exist N or more segments with the same age value A, where N is the value assigned to the LSM_CONFIG_AUTOMERGE parameter. If so, work is done to merge all such segments with age=A into a new, larger segment assigned age=A+1. At present, "periodically" as used above means roughly once for every 32KB of data (including overhead) written to the in-memory tree. The merge operation is not necessarily completed within the same call to a write API in which it is started (this would result in blocking the writer thread for too long in many cases - in large databases segments may grow to be many GB in size). At present, the amount of data written by a single auto-work operation is roughly 32KB multiplied by the number of segments in the database file. This formula may change - the point is that the library attempts to limit the amount of data written in order to avoid blocking the writer thread for too long within a single API call. <p>Each time a transaction is committed in auto-work mode, the library checks to see if there exists an "old" in-memory tree (see the LSM_CONFIG_AUTOFLUSH option above). If so, it attempts to flush it to disk immediately. Unlike merges of existing segments, the entire in-memory tree must be flushed to disk before control is returned to the user. It is not possible to incrementally flush an in-memory tree in the same ways as it is possible to incrementally merge existing database segments together. <p>In order to perform auto-work on the database file, either to flush the contents of an old in-memory tree to disk or to merge existing segments within the database file together, the client must obtain the WORKER lock. If some other client is already holding the WORKER lock, this will not be possible. This is not an error. If this occurs, the writer thread simply returns immediately, without performing any work on the database file. <p>Assuming the LSM_CONFIG_AUTOCHECKPOINT parameter is set to a value greater than zero, after performing database work, the library automatically checks how many bytes of raw data have been written to the database file since the last checkpoint (by any client, not just by the current client). If this value is greater than the value of the LSM_CONFIG_AUTOCHECKPOINT parameter, a checkpoint is attempted. It is not an error if the attempt fails because the CHECKPOINTER lock cannot be obtained. <h3>3.2.2 Explicit Work and Checkpoint Scheduling</h3> <p>The alternative to automatic scheduling of work and checkpoint operations is to explicitly schedule them. Possibly in a background thread or dedicated application process. In order to disable automatic work, a client must set the LSM_CONFIG_AUTOWORK parameter to zero. This parameter is a property of a database connection, not of a database itself, so it must be cleared separately by all processes that may write to the database. Otherwise, they may attempt automatic database work or checkpoints. <p>The lsm_work() function is used to explicitly perform work on the database: <verbatim> int lsm_work(lsm_db *db, int nMerge, int nByte, int *pnWrite); </verbatim> <p>Parameter nByte is passed a limit on the number of bytes of data that should be written to the database file before the call returns. It is a hint only, the library does not honor this limit strictly. <p>If the database has an old in-memory tree when lsm_work() is called, it is flushed to disk. If this means that more than nByte bytes of data is written to the database file, no further work is performed. Otherwise, the number of bytes written is subtracted from nByte before proceeding. <p>If parameter nMerge is greater than 1, then the library searches for nMerge or more segments of the same age within the database file and performs up to nByte bytes of work to merge them together. If the merge is completed before the nByte limit is exceeded, the library searches for another set of nMerge or more segments to work on, and so on. If at any point no such set of nMerge segments can be found, the call returns without performing any further work. <p>Calling lsm_work() with the nMerge argument set to 1 is used to "optimize" the database (see below). Passing a value of zero or less for the nMerge parameter is an error. <p>In any case, before returning the value of *pnWrite is set to the actual number of bytes written to the database file. <p>The example code below might be executed in a background thread or process in order to perform database work and checkpointing. In this case all other clients should set the LSM_CONFIG_AUTOWORK parameter to zero. <verbatim> int rc; lsm_db *db; int nCkpt = 4*1024*1024; /* Open a database connection to database "test.db". ** ** Configure the connection to automatically checkpoint the database after ** writing each 4MB of data to it (instead of the default 2MB). As well ** as to auto-work, the LSM_CONFIG_AUTOCHECKPOINT parameter applies to data ** written by explicit calls to lsm_work(). */ lsm_new(0, &db); lsm_config(db, LSM_CONFIG_AUTOCHECKPOINT, &nCkpt); lsm_open(db, "test.db"); while( 1 ){ int nWrite; /* Attempt up to 512KB of work. Set nWrite to the number of bytes ** actually written to disk. */ rc = lsm_work(db, 2, 512*1024, &nWrite); if( rc!=LSM_OK && rc!=LSM_BUSY ){ /* Anything other than LSM_OK or LSM_BUSY is a problem. LSM_BUSY ** indicates that some other client has taken the WORKER lock. Any ** other error indicates something has gone quite wrong. */ lsm_close(db); return rc; } /* nWrite may be set to zero here in two scenarios. lsm_work() ** may have failed to obtain the WORKER lock and returned LSM_BUSY, ** indicating that some other connection is working on the database. ** Alternatively, it may be that there was no old in-memory tree to ** flush and no two segments of the same age within the database file, ** meaning the function could find no work to do. ** ** In either case, there is no point in calling lsm_work() again ** immediately. Instead, sleep for a second before continuing. By that ** time, things may have changed (the other process may have relinquished ** the WORKER lock, or an in-memory tree may have been marked as old). */ if( nWrite==0 ) sleep(1); } </verbatim> <p>Checkpoints can also be requested explicitly, using the lsm_checkpoint() API: <verbatim> int lsm_checkpoint(lsm_db *db, int *pnCkpt); </verbatim> <p>If no work has been performed on the database since the most recent checkpoint (implying that the snapshot has not changed and there is no need to write it into the database file), lsm_checkpoint() sets *pnCkpt to zero and returns immediately. Otherwise, it checkpoints the database and sets *pnCkpt to the number of bytes written to the database file since the previous checkpoint. <p>The number of bytes written to the database since the most recent checkpoint can also be using the lsm_info() API function. As follows: <verbatim> int nCkpt; rc = lsm_info(db, LSM_INFO_CHECKPOINT_SIZE, &nCkpt); </verbatim> <verbatim> int nOld, nLive; rc = lsm_info(db, LSM_INFO_TREE_SIZE, &nOld, &nLive); </verbatim> <verbatim> int lsm_flush(lsm_db *db); </verbatim> <h3>3.2.3 Compulsary Work and Checkpoint Scheduling</h3> <p>Apart from the scenarios described above, there are two there are two scenarios where database work or checkpointing may be performed automatically, regardless of the value of the LSM_CONFIG_AUTOWORK parameter. <ul> <li> When closing a database connection, and <li> When the number of segments with a common age in the database file grows unacceptably high. </ul> <p>Whenever an lsm_close() call would mean that the total number of connections to a database drops to zero, the connection checks if the in-memory tree is empty. If not, it is flushed to disk. Both the live and old in-memory trees are flushed to disk in this case. It also checks if the database file has been modified since the most recent checkpoint was performed. If so, it also performs a checkpoint. And, assuming no error has occurred, deletes the log file. <p>Additionally, whenever a worker wishes to flush an in-memory tree to a new age=1 segment, it must first ensure that there are less than N existing age=1 segments, where N is the value that the LSM_CONFIG_AUTOMERGE parameter is set to. If there are already N or more age=1 segments, they must be merged into an age=2 segment before a new age=1 segment can be created within the database file. Similar rules apply to segments of other ages - it is not possible to create a new age=I segment if there are already N segments with age=I in the database file. This has two implications: <ul> <li> The database is prevented from accumulating too many segments, regardless of whether or not auto-work is enabled or how infrequently lsm_work() is called, and <li> If auto-work is disabled and lsm_work() is not called frequently enough, it is possible that flushing an in-memory tree may required writing a tremendous amount of data to disk (possibly even rewriting the entire database file). </ul> <p>Finally, regardless of age, a database is limited to a maximum of 64 segments in total. If an attempt is made to flush an in-memory tree to disk when the database already contains 64 segments, two or more existing segments must be merged together before the new segment can be created. <h2>3.3 Database Optimization</h2> <p>Database optimization transforms the contents of database file so that the following are true: <ul> <li> All database content is stored in a single segment. <li> The database file contains no (or as little as possible) free space. In other words, it is no larger than required to contain the single segment. </ul> <p>In order to optimize the database, lsm_work() should be called repeatedly with the nMerge argument set to 1 until it returns without writing any data to the database file. For example: <verbatim> int nWrite; int rc; do { rc = lsm_work(db, 1, 2*1024*1024, &nWrite); }while( rc==LSM_OK && nWrite>0 ); </verbatim> <p>When optimizing the database as above, the LSM_CONFIG_AUTOCHECKPOINT parameter should be set to a non-zero value, or otherwise lsm_checkpoint() should be called periodically. Otherwise, no checkpoints will be performed, preventing the library from reusing any space occupied by old segments even after their content has been merged into the new segment. The result - a database file that is optimized, except that it is up to twice as large as it otherwise would be. <h2>3.3 Other Parameters </h2> <i> <p>Mention other configuration options that can be used to tune performance here. <ul> <li> LSM_CONFIG_MMAP <li> LSM_CONFIG_MULTIPLE_PROCESSES </ul> </i> <h1>4. Compressed and Encrypted Databases </h1> <p>LSM does not provide built-in methods for creating encrypted or compressed databases. Instead, it allows the user to provide hooks to call external functions to compress and/or encrypt data before it is written to the database file, and to decrypt and/or uncompress data as it is read from the database file. <p>A database connection is configured to call compression functions using a call to lsm_config() with the second argument set to LSM_CONFIG_SET_COMPRESSION. The third argument should point to an instance of type lsm_compress, which is defined as follows: <verbatim> typedef struct lsm_compress lsm_compress; struct lsm_compress { u32 iId; void *pCtx; int (*xBound)(void *pCtx, int nIn); int (*xCompress)(void *pCtx, void *pOut, int *pnOut, const void *pIn, int nIn); int (*xUncompress)(void *pCtx, void *pOut, int *pnOut, const void *pIn, int nIn); void (*xFree)(void *pCtx); }; </verbatim> <p><i> Explain how the hooks work here (same as zipvfs) </i> <p><i> Example code? Using zlib? Or something simple like an RLE implementation?</i> <p>The database file header of any LSM database contains a 32-bit unsigned "compression id" field. If the database is not a compressed database, this field is set to 1. Otherwise, it is set to an application supplied value identifying the compression and/or encryption scheme in use. Application compression scheme ids must be greater than or equal to 10000. Values smaller than 10000 are reserved for internal use. <p>The lsm_compression_id() API may be used to read the compression id from a database connection. Because the compression id is stored in the database header, it may be read before any required compression or encryption hooks are configured. <verbatim> #define LSM_COMPRESSION_EMPTY 0 #define LSM_COMPRESSION_NONE 1 int lsm_compression_id(lsm_db *db, u32 *piId); </verbatim> <p>When a database is opened for the first time, before it is first written, the compression id field is set to LSM_COMPRESSION_EMPTY (0). The first time a transaction is committed, the database compression id is set to a copy of the lsm_compress.iId field of the compression hooks for the database handle committing the transaction, or to LSM_COMPRESSION_NONE (1) if no compression hooks are configured. <p>Once the compression id is set to something other than LSM_COMPRESSION_EMPTY, when a database handle opens a read or write transaction on the database, the compression id is compared against the lsm_compress.iId field of the configured compression hooks, or against LSM_COMPRESSION_NONE if no compression hooks are configured. If the compression id does not match, then an LSM_MISMATCH error is returned and the operation fails (no transaction or database cursor is opened). <p><i>Maybe there should be a way to register a mismatch-handler callback. Otherwise, applications have to handle LSM_MISMATCH everywhere... </i> |