hctree

doc/hctree/index.html
Login

doc/hctree/index.html

HC-tree
Links: ...Test Results ...Design Documentation ...File Format... ...Replication API Notes... ...Replication Test Server...

HC-tree

SQLite is sometimes used as the core of a client/server database system. While it works reliably well in such cases, the database backend module that it uses to store b-tree structures in its database file was not designed with this case in mind and can be improved upon in several ways. The HC-tree (hctree) project is an attempt to develop a new database backend that improves upon regular SQLite as follows:

An implicit goal is that hctree must be as fast or faster than stock SQLite for all single-threaded cases. There is no point in running dozens of concurrent writers if each of them is an order of magnitude slower than a single writer writing to a legacy database.

Hctree clients (those that use a version of SQLite compiled from this repository) may read hctree databases and stock SQLite databases.

Project/Prototype Status

This project contains a fork of the SQLite project that has been modified to include a prototype of the hctree database backend. It may be built and used in all the same ways as the stock SQLite library code.

The prototype is advanced enough to experiment with and to conduct multi-threaded performance tests with, but is incomplete. The current code has, at least, the following shortcomings:

  1. "BEGIN EXCLUSIVE" is not implemented for hctree databases. It is parsed, but behaves just as a regular "BEGIN" does.

  2. All hctree databases currently use 32-bit page numbers (maximum database size of 16TiB with the default 4KiB byte page size).

  3. Transactions that do not fit entirely in main memory (e.g. large CREATE INDEX statements) are not handled efficiently.

  4. Support for replication is now implemented. There is a replication API and a scriptable test application for performance and other testing. The API is based on an investigation into the kinds of features that might be required to support high-concurrency replication in the bedrock database system.

  5. The prototype library only works on POSIX, not win32, systems.

  6. All clients of an hctree database, read or read/write, must access the database from within the same OS process.

  7. There is no support for recovering from a power failure or operating system crash. Such an event may corrupt the database.

Obtaining and Building the Prototype

The prototype is on the "hctree" branch (not trunk!) of the fossil repository at https://sqlite.org/hctree. To obtain the latest development snapshot, download: or check the code out using fossil:
    fossil clone https://www.sqlite.org/hctree hctree.fossil
    mkdir hctree
    cd hctree
    fossil open ../hctree.fossil
    fossil up hctree
The sources may be built and deployed in the same ways as stock SQLite.

Application Programming

Creating/Opening Databases

When opening an existing database, the library determines automatically whether or not it is a stock SQLite database or an hctree database. By default, when creating a new database, the library creates an ordinary SQLite database. To create a new hctree database, the SQLite URI parameter "hctree=1" must be specified the first time the new database is opened. e.g.

        file://new_hctree_database.db?hctree=1

The easiest ways to check if a database really is an hctree database are:

Each database consists of two files - the database file itself and a second file name "<database>-pagemap".

Concurrency Model

Hctree offers similar MVCC based optimistic concurrency to SQLite with the begin-concurrent extension, except that it validates transactions based on the keys and logical ranges of keys accessed instead of the set of pages.

As in any MVCC database system, including stock SQLite, both readers and writers access a consistent snapshot of the database, in which writes committed after the transaction was opened are never visible. Any writes made within the transaction are accumulated privately and made visible only to the transaction itself until they are committed.

When a transaction is to be committed in an optimistic concurrency system, it must first be validated. To validate a transaction is to check that, when considered in concert with all other past and current transactions, the transaction might have been committed in a system that serializes all transactions without changing the outcome in terms of the final state of the database or query results returned to any client. In practice, there are two ways for an hctree transaction to be validated:

  1. If no other client has committed a transaction to the database since the writer's snapshot was opened, the transaction must be valid.

  2. If no other client has modified any b-tree (table or index) entry or range that the transaction being committed accessed by a range or stabbing query, then the transaction is valid.

For example, considering the following database:

    CREATE TABLE t1(a INTEGER PRIMARY KEY, b TEXT, c TEXT);
    CREATE INDEX i1 ON t1(b);
    INSERT INTO t1 VALUES(1, 'a', 'A');
    INSERT INTO t1 VALUES(2, 'b', 'B');
    INSERT INTO t1 VALUES(3, 'x', 'X');
    INSERT INTO t1 VALUES(4, 'c', 'C');
    INSERT INTO t1 VALUES(5, 'd', 'D');

And the following transaction:

    BEGIN CONCURRENT;
      SELECT * FROM t1 WHERE (b BETWEEN 'a' AND 'c') AND c!='B';
      INSERT INTO t1 VALUES(6, 'e');
    COMMIT;

Then during transaction execution, hctree records that the following were accessed:

If, when the attempt to commit is made, the hctree client finds that either the contents of the index b-tree range or one of the table b-tree keys has been modified since the writer's snapshot was opened, validation does not succeed and the transaction is not committed.

Note that another client modifying the (2, 'b', 'B') tuple causes validation to fail, even though the SELECT query excludes that row. This is because validation is based on the entries and ranges of entries read from the b-tree layer, not the subset of those actually used by the database engine. If there were no index in the database schema, then the example SELECT query would be implemented using a full-table scan of the table b-tree. In this case any modification to any row of the table b-tree would cause transaction validation to fail.

All database transactions implicitly read the database schema. So any transaction that modifies the database schema may only be commited if no other client has written to the database since the writer's snapshot was opened (case 1 above).

Transaction Types

Hctree allows clients to open three different transaction types, as follows:

  1. BEGIN; Transactions opened with "BEGIN" do not collect validation data. This means that if they write to the database, they may only be committed if no other client has written the database since the writer's snapshot was opened (validation case 1 above).

  2. BEGIN CONCURRENT; Transactions opened with "BEGIN CONCURRENT" do collect validation data as they run. So that if they write to the database, upon commit they may, subject to validation, be committed even if other transactions have been committed since the snapshot was opened.

  3. BEGIN EXCLUSIVE; A "BEGIN EXCLUSIVE" command not only opens a transaction, but also prevents any new transaction-commit operations from starting, and spin-locks until all ongoing commits have finished. Any new commit operations started while the BEGIN EXCLUSIVE transaction is active also spin-lock until it is finished. Thus a transaction started using "BEGIN EXCLUSIVE" always passes validation, since it is always committed against the same snapshot against which it is prepared.

Implicit transactions are handled in the same way as those opened using "BEGIN".

In general, "BEGIN" should be used for read-only transactions, as it does not incur the overhead of accumulating data required for validation. "BEGIN CONCURRENT" should be used for most read/write transactions and "BEGIN EXCLUSIVE" as a fallback for transactions that are likely to conflict otherwise (perhaps because they have already been tried and found to conflict several times already, or perhaps because they contain schema modifications).

In all cases above, "spin-lock" actually means invoke the SQLite xBusy callback if one is registered, or to literally spin-lock otherwise.

Other Differences From Stock SQLite

When using an hctree database:
  1. A transaction cannot be committed while there are ongoing SELECT statements.
  2. A "DELETE FROM tbl" statement requires the collation sequences associated with the table be registered. Regular SQLite does not require this.
  3. REINDEX has less chance of fixing a corrupt index.
  4. The value returned by "PRAGMA data_version" changes even when the database is modified by the same db connection.
  5. "PRAGMA locking_mode" is a no-op.
  6. The backup API does not work.
  7. The VACUUM command does not work.
These will most likely all eventually be fixed. Issue (1) might be fixed with the proviso that it is only possible to commit a transaction with open SELECT statements if no other connection has written to the db since the transaction was opened.