SQLite4
The Design Of SQLite4
Not logged in

1.0 Executive Summary

  1. SQLite4 is a compact, self-contained, zero-adminstration, ACID database engine in a library, just like SQLite3, but with an improved interface and file format.
  2. The run-time environment is encapsulated in an object.
  3. A greatly simplified Key/Value storage engine is used:
    1. A single large key space - not separate key spaces for each table and index as in SQLite3.
    2. Keys sort in lexicographical order.
    3. Multiple storage engines, interchangeable at run-time.
    4. Default on-disk storage engine uses a log-structured merge database.
  4. The PRIMARY KEY of a table really is used as the key to the storage engine.
  5. Decimal arithmetic is used.
  6. Foreign key constraints and recursive triggers are on by default.
  7. Covering indices can be declared explicitly.

2.0 Overview

SQLite4 is an alternative, not a replacement, for SQLite3. SQLite3 is not going away. SQLite3 and SQLite4 will be supported in parallel. The SQLite3 legacy will not be abandoned. SQLite3 will continue to be maintained and improved. But designers of new systems will now have the option to select SQLite4 instead of SQLite3 if desired.

SQLite4 strives to keep the best features of SQLite3 while addressing issues with SQLite3 that can not be fixed without breaking compatibility. Among the features that will remain in common between SQLite3 that SQLite4 are:

  1. SQLite4 is a complete, relational, transactional, ACID, SQL database database engine contained in a compact library that links into a larger application. There is no server. I/O is direct to disk.

  2. The source code for SQLite4 is available to anyone for any purpose. There are no restrictions on copying, distribution, or publication of sources or compiled binaries. There are no viral licenses to worry over.

  3. Dynamic typing is used, rather than the rigid static typing of most other SQL database engines.

  4. The (default) on-disk database image is a single disk file with a well-documented and stable file format, making the SQLite4 library suitable for use as an application file format.

  5. SQLite4 will be fast and reliable and require no administrator attention. It just works.

  6. The implementation of SQLite4 has minimal external dependencies so that it is easy to incorporate into embedded systems or other unusual runtime environments.

The implementation is still in C, which we regard as the universal assembly language. However, SQLite4 will use more C99 features than did SQLite3, while still striving to maintain compatibility with popular compilers. SQLite4 makes use of standard data types such as size_t, int64_t, uint64_t, and others.

The programming interface for SQLite4 is similar to that of SQLite3, though with all name prefixes changed from "sqlite3_" to "sqlite4_". Legacy and deprecated interfaces in SQLite3 have been removed from SQLite4. There are additional parameters to some routines, and on occasion the parameters are altered slightly or reordered. Some of the interface names have been modified to be more rational. But on the whole, the programming interface for SQLite4 is remarkably similar to SQLite3, and porting an application from SQLite3 to SQLite4 will usually involve only an hour or two search-and-replace.

SQLite3 and SQLite4 share no symbols in common, so it is possible to link both SQLite3 and SQLite4 into the same process and use them both at the same time.

3.0 Key Changes In SQLite4

3.1 The Run-Time Environment Object

Some of the interfaces in SQLite4 take a new (added) first parameter which is a pointer to an sqlite4_env object that defines the run-time environment. Examples of routines that require the sqlite4_env pointer include:

An instance of an sqlite4_env object defines how SQLite4 interacts with the rest of the system. An sqlite4_env object includes methods to:

Builds of SQLite4 for standard platforms (windows and unix) contain a global sqlite4_env object that is normally appropriate for use on that platform. Any interface routine that requires a pointer to an sqlite4_env object will use the global default sqlite4_env object if a NULL pointer is passed on its sqlite4_env parameter. Most applications never need to pass anything other than this NULL pointer. However, some applications may want to have two or more instances of SQLite4 running in the same address space where each instance uses different mutex primitives, a different heap, different date/time functions, etc. SQLite4 accommodates this need by allowing each database instance to be created using a different sqlite4_env object. The sqlite4_env object also eliminates all use of global and static variables in SQLite4, making SQLite4 easier to port to some embedded systems with limited support for static or global data.

3.2 Simplified Key/Value Storage Engine

SQLite4 uses a key/value storage engine which has a greatly simplified interface relative to SQLite3. The storage engine is pluggable; it can be changed out at runtime by making appropriate alterations to the sqlite4_env object prior to opening a new database connection.

SQLite4 needs a storage engine that implements ordered key/value pairs where the key and value are arbitrary-length binary data. Keys must be unique and must be ordered lexicographically. In other words, the keys should be ordered according to a comparison function like the following:

int key_compare(const void *key1, int n1, const void *key2, int n2){
  int c = memcmp(key1, key2, n1<n2 ? n1 : n2);
  if( c==0 ) c = n1 - n2;
  return c;
}

SQLite4 needs to be able to seek to the nearest key for a given probe key, then step through keys in lexicographical order in either the ascending or the descending direction. Inserts with a preexisting key overwrite the old data. Transactions, including atomic commit and rollback, are the responsibility of the storage engine.

SQLite4 stores all content, from all tables and all indices, in a single keyspace. This contrasts with SQLite3 that required a separate keyspace for each table and each index. The SQLite4 storage also differs from SQLite3 in that it requires the storage engine to sort keys is lexicographical order, whereas SQLite3 uses a very complex comparison function to determine the record storage order.

SQLite4 communicates with the storage engine through a well-defined and simple interface. New storage engines can be substituted at runtime simply by replacing some pointers to functions inside the sqlite4_env object prior to allocating a database connection.

If an alternative storage engine does not support rollback, that just means that ROLLBACK will not work in SQLite4. If the alternative storage engine does not support nested transactions, that means that nested transactions will not work on SQLite4. So less-capable storage engines can be plugged into SQLite4, though with a corresponding reduction in the capability of the overall system.

The default built-in storage engine is a log-structured merge database. It is very fast, faster than LevelDB, supports nested transactions, and stores all content in a single disk file. Future versions of SQLite4 might also include a built-in B-Tree storage engine.

3.3 The PRIMARY KEY Is The Real Primary Key

SQLite3 allows one to declare any column or columns of a table to be the primary key. But internally, SQLite3 simply treats that PRIMARY KEY as a UNIQUE constraint. The actual key used for storage in SQLite is the rowid associated with each row.

SQLite4, on the other hand, actually uses the declared PRIMARY KEY of a table (or, more precisely, an encoding of the PRIMARY KEY value) as the key into the storage engine. SQLite4 tables do not normally have a rowid (unless the table has no PRIMARY KEY in which case a rowid is created to be the implicit primary key.) That means that content is stored on disk in PRIMARY KEY order. It also means that records can be located in the main table using just a single search on the PRIMARY KEY fields. In SQLite3, a search on the primary key means doing a search on an automatic index created for that primary key to find the rowid, then doing a second search on the main table using the rowid.

SQLite4 requires all elements of the PRIMARY KEY to be non-null. This is an SQL standard. Due to an oversight in early versions, SQLite3 does not enforce the NOT NULL constraint on PRIMARY KEY columns since by the time the omission was discovered SQLite3 was in such widespread use, activation of NOT NULL enforcement would have broken too many programs.

3.4 Decimal Math

SQLite4 does all numeric computations using decimal arithmetic. SQLite4 never uses C datatypes double or float (except in interface routines when converting between double and the internal decimal representation). Instead, all numeric values are represented internally as an 18-digit decimal number with a 3-digit base-10 exponent. Features of this represention include:

SQLite4 makes no distinction between integer and floating point numbers. It does, however, distinguish between exact and approximate numbers. In C/C++, integers are exact and floating point numbers are approximate. But this is not necessarily true of SQLite4. Floating point numbers can be exact in SQLite4. Integers that can be represented in 64-bits are always exact in SQLite4 but larger integers might be approximate.

The SQLite4 numeric format is for internal use. Numbers can be translated between integers or doubles for input and output. The on-disk storage space required for SQLite4 numbers varies from between 1 and 12 bytes, depending on the magnitude and the number of significant digits in the value.

3.5 Foreign Key Constraints And Recursive Triggers Are On By Default

Foreign key constraints were not available in early versions of SQLite3 and so they default off for backwards compatibility. But foreign key constraints are always available and default on in SQLite4. All foreign key constraints are deferred by default, though they can be declared to be immediate when created. There is no mechanism available to toggle foreign key constraints between deferred and immediate.

SQLite3 supports recursive triggers, but only if a run-time option is available. In SQLite4, all triggers are recursive all the time.

3.6 Explicit Index Cover

SQLite4 adds an optional clause to CREATE TABLE statements that defines extra columns of information that are duplicated in the index. This allows the application developer to explicitly create covering indices in SQLite4 without resorting to trickery and subterfuge. For example:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

The cover1 index above allows queries of the form "SELECT c,d FROM table1 WHERE a=?1 AND b=?2" to be executed with only a single lookup into the storage engine. Without the extra COVERING clause on the index, SQLite4 would have to do two lookups in the storage engine; one to find the primary key based on the values of a and b and a second to find the values of c and d based on the primary key. The COVERING clause makes the values of c and d available in the index, which means that they can be extracted directly from the index without a second lookup.

A variation on the COVERING clause is:

CREATE INDEX cover2 ON table(x,y) COVERING ALL;

The COVERING ALL clause means that all columns of the table are duplicated in the index which guarantees that the original table will never have to be consulted to complete a query that uses the index. The downside of this, of course, is that information is duplicated and so the database file is larger. But by providing an explicit COVERING close on indices, SQLite4 allows application developers to make space versus time tradeoffs that are appropriate for their application.