Cloud Backed SQLite

Cloud Backed SQLite
Login

Page Contents

1. Overview

The "Cloud Backed SQLite" (CBS) system allows databases to be stored within cloud storage accounts such that they can be read and written by storage clients without first downloading the entire database to the client. This document should be read in concert with the detailed API documentation present in the two main public header files:

Databases may be accessed concurrently by multiple clients, although ensuring that only a single client is writing to a database at any time is (mostly) left to the application. A "client" is a single operating system process system - within a single client process there may be multiple SQLite handles reading and writing a database using the usual SQLite WAL-mode read/write locking to manage concurrent access between themselves. Existing clients do not see changes made by writers automatically, they are obliged to explicitly poll cloud storage to do so.

The system currently supports Azure Blob Storage and Google Cloud Storage. It also features an API that may be used to implement support to other cloud storage systems.

The software is currently developed on Linux with stock gcc, and on Windows-10 using the MSVC-compatible mingw64 gcc compiler. Other platforms are quite likely to work too, but have not been tested.

1.1. Storage Format

SQLite databases are not stored within the blob storage system in the usual format. Instead, each database is divided into one or more fixed-size blocks. The default block-size is 4MB, although this is configurable. Each block is assigned an id between 16 and 32 bytes (128 and 256 bits) in size (all blocks in a single container have the same sized ids). The name of each block file is created by encoding its id in hexadecimal format and adding the extension ".bcv". For example:

  787A5B026DBF882F89748C37AED04CCD.bcv

For containers that use ids smaller than 24 bytes in size, each block id is randomly generated. For manifests that use ids 24 bytes or larger, the first 16 bytes of each id may contain the md5 hash of the block contents, with the remainder of the id made up by pseudo-random values. See the section on block sharing below for further details.

Along with the block files is a "manifest" file, which contains, amongst other things, a machine-readable description of how the various block files may be assembled into into one or more SQLite databases. The name of the manifest file is always:

  manifest.bcv

There is also an SQLite database used to store the contents of the "bcv_kv" virtual table named:

  bcv_kv.bcv

At present, manifest and block files may only be stored in the root directory of a blob storage container, not in any sub-directory. This means that each container contains at most a single manifest file. It may be that this limitation can be removed if desirable. Other files may be stored in a container along with the CBS files, but they must not use the ".bcv" extension. Files that end with ".bcv" may be mistaken for surplus block or other files and deleted or overwritten by CBS clients.

Note: CBS documentation uses the Azure terminology "container" to refer to the cloud storage virtual receptacle in which block files and manifests are stored. In Google Storage documentation the analagous concept is a "bucket". Other cloud storage systems may use their own vernacular.

1.2. System Components

There are three components to the system:

  1. Primitives to:
    • Create blob storage containers and populate them with empty manifest files (manifest files indicating that the container contains no databases).
    • Destroy entire blob storage containers and their contents.
    • Upload databases to cloud storage.
    • Download databases from cloud storage.
    • Create copies of existing databases within cloud storage.
    • Delete databases from cloud storage.
    • List the databases within a cloud storage container.
    • Clean up (remove) unused blocks from cloud storage.

    Each primitive is available both as an C API that can be called by applications, and as a command-line tool.

  2. A daemon process that, while running, provides local SQLite database clients in the same or different OS process with read-only access to databases stored in remote blob storage containers.
  3. A VFS module that may be used in two modes, as follows:
    • For read-only access of cloud databases via a running daemon process, or
    • For read/write access of cloud databases in "daemonless" mode.

    The advantage of using a daemon over daemonless mode is that a single local cache of downloaded database blocks may be shared by multiple processes. In daemonless mode, each process is obliged to maintain its own separate cache.

Both the daemon and command line tools are packaged as a single binary executable - "blockcachevfsd".

2. Building, Deployment and Testing

command from the root directory of the source tree.

Adding Application Support

To add support for the VFS and various primitives, the following C files from the source code distribution must be built into the application:
    bcvutil.c
    bcvmodule.c
    blockcachevfs.c
    simplexml.c
    bcvencrypt.c

The following header files from the source code distribution are required:

    bcvutil.h
    bcvmodule.h
    blockcachevfs.h
    simplexml.h
    bcv_int.h
    bcvencrypt.h

As well as SQLite, the application must be linked against libcurl and openssl.

The application should not include either "bcv_int.h" or "simplexml.h" directly. They are required by the build, but are not intended to be used directly. The other three header files are intended to be used by applications, they container declarations for APIs that provide the following functionality:

Building blockcachevfsd:

The easiest way to build the "blockcachevfsd" executable that contains both the daemon and the command line tools is using the standard:
  ./configure && make

Alternatively, it may be built from all sources listed above as required for adding blockcachevfs support for an application, along with file "blockcachevfsd.c", which contains the main routine.

Testing the system:

The blockcachevfs system has an automated test suite. Notes describing its use may be found here.

3. Usage

3.1. Uploading Databases

This section illustrates how to upload databases to a blob storage account and begin using them. It assumes that cloud storage module name and authentication details, for example an account name and access key are stored in a file named "account.txt" in the current directory. Example account.txt contents for Azure Blob Storage:

  -module azure
  -user devstoreaccount1
  -auth Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==

The command-line options above can just as easily be specified directly on the command-line. But storing them in a file, and then using the "-f <file>" option supported by all commands to treat the contents of the named file as if they were specified directly on the command-line can be less cumbersome.

To begin, create a container and an empty manifest to which databases may be uploaded, where $CONTAINER is the container name:

  $ blockcachevfsd create -f account.txt $CONTAINER

If the named blob storage container already exists, CBS will clobber any manifest file within it with a new, empty, manifest file, effectively deleting any databases in the container.

Next, upload one or more databases from the local file system to the blob storage account. In the following, $LOCALDB is an absolute or relative path to an SQLite database file and $DBNAME is the name used to access the database after it has been uploaded. It is an error if there is already a database named $DBNAME in the manifest:

  $ blockcachevfsd upload -f account.txt -container $CONTAINER $LOCALDB $DBNAME

3.2. Accessing Databases

This section explains, by way of example code, how to access a blockcachevfs datbase stored in cloud storage using an SQL client. The full program is available here. It should be read together with the API documentation in file blockcachevfs.h

The example code implement a program that accepts four arguments on the command line:

The program creates a VFS, attaches the container, opens a database handle and then executes the supplied SQL script. The code as presented below omits several uninteresting features for brevity - there is no main() routine for example. But the full version, linked above, contains everything required for a working application.

The following block contains the required #include directives and the authentication callback for the example application. All applications require an authentication callback, as there is no other way to provide authentication tokens to system. And there are currently no implementations of cloud storage modules that do not require authentication tokens.

/*
** The code in this file creates and uses a VFS, but it doesn't use any 
** cloud storage primitives or implement a new cloud storage module, so 
** it only needs to include "blockcachevfs.h". And "sqlite3.h" of course.
*/
#include "blockcachevfs.h"
#include "sqlite3.h"

/*
** This program is hardcoded to access an Azure emulator running on port
** 10000 (the default for Azurite) of the localhost. It is also hardcoded
** to the demo account - the account built-in to the emulator with the
** well-known credentials reproduced below. 
*/ 
#define CS_STORAGE "azure?emulator=127.0.0.1:10000"
#define CS_ACCOUNT "devstoreaccount1"
#define CS_KEY "Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="

/*
** Authentication callback. A real application would return a different 
** authentication token based on the storage system, account name and 
** container name parameters, but since the credentials used by this
** application are hard coded, it just returns a copy of constant string
** CS_KEY.
**
** Because the API is defined such that this function must return a buffer
** allocated using sqlite3_malloc() or compatible, this implementation
** uses sqlite3_mprintf() to make a copy of the authentication token.
*/
static int csAuthCb(
  void *pCtx,
  const char *zStorage,
  const char *zAccount,
  const char *zContainer,
  char **pzAuthToken
){
  *pzAuthToken = sqlite3_mprintf("%s", CS_KEY);
  return (*pzAuthToken) ? SQLITE_OK : SQLITE_NOMEM;
}

The next block contains the start of the cloudsql() function, which does the bulk of the work for this application. It begins by creating the VFS object. Note that, like almost all other blockcachevfs APIs, if an error occurs the sqlite3_bcvfs_create() function returns an error message in a buffer that must be freed by the caller using sqlite3_free().

Once the VFS has been created successfully, the application can use the sqlite3_bcvfs_isdaemon() API to see if the VFS has connected to a daemon or is running in daemonless mode.

If, when sqlite3_bcvfs_create() is called, there is a daemon process using the specified directory as its cache directory, then the VFS created by the sqlite3_bcvfs_create() call automatically connects to the daemon and provides read-only access. The daemon can be started using the following command line:

    blockcachevfsd daemon $DIRECTORY

where $DIRECTORY above must be the same string as passed to cloudsql() function below via the zDir parameter. If a daemon process was using the directory for its cache directory, but has since exited, the sqlite3_bcvfs_create() call will fail.

If there has never been a daemon running in the directory, then the call to sqlite3_bcvfs_create() creates a VFS running in daemonless mode. In this mode it requires exclusive access to the directory. If there is some other daemonless VFS already running in the specified directory, the call to sqlite3_bcvfs_create() fails with SQLITE_BUSY.

/*
** Open a VFS that uses directory zDir as its cache directory. Then attach
** container zCont. Next, open an SQLite handle on path zPath using the new 
** VFS and execute SQL script zSql.
*/
static int cloudsql(
  const char *zDir,               /* Directory to use for blockcachevfs cache */
  const char *zCont,              /* Container to attach */
  const char *zPath,              /* Path to open */
  const char *zSql                /* SQL to execute */
){
  int rc = SQLITE_OK;             /* Error code */
  char *zErr = 0;                 /* Error message */
  sqlite3_bcvfs *pVfs = 0;        /* VFS handle */
  sqlite3 *db = 0;                /* Database handle open on zPath */

  /* Create a VFS object. Directory zDir must already exist. If it exists
  ** and there is a daemon running in that directory, the new VFS connects
  ** to the daemon for read-only access. Or, if there is no such daemon,
  ** the new VFS will provide read-write daemonless access.  */
  rc = sqlite3_bcvfs_create(zDir, "myvfs", &pVfs, &zErr);

  /* Check if this is a daemon VFS or not */
  if( rc==SQLITE_OK ){
    if( sqlite3_bcvfs_isdaemon(pVfs) ){
      printf("VFS is using a daemon\n");
    }else{
      printf("VFS is in daemonless mode\n");
    }
  }

Following this, assuming it was create successfully, the code connects the authorization callback to the new VFS. And then "attaches" the cloud storage container specified by the user to the VFS.

Before the databases in a cloud storage container can be accessed, it must be attached to the VFS. Once it has been attached, SQL clients may access the databases within the container by opening paths of the form "/$CONTAINER/$DATABASE" using the blockcachevfs VFS.

  /* Configure the authorization callback. */
  if( rc==SQLITE_OK ){
    sqlite3_bcvfs_auth_callback(pVfs, 0, csAuthCb);
  }

  /* Attach the container. Specify the SQLITE_BCV_ATTACH_IFNOT flag so that
  ** it is not an error if the container is already attached. 
  **
  ** There are two reasons the container might already be attached, even
  ** though the VFS was only just created. Firstly, if this VFS is connected
  ** to a running daemon process, then some other client may have already
  ** attached the container to the daemon. Secondly, VFS objects store their
  ** state in the cache directory so that if they are restarted, all
  ** containers are automatically reattached. So if this (or some other
  ** blockcachevfs application) has run before specifying the same 
  ** cache directory, the container may already be attached.  */
  if( rc==SQLITE_OK ){
    rc = sqlite3_bcvfs_attach(pVfs, CS_STORAGE, CS_ACCOUNT, zCont, 0,
        SQLITE_BCV_ATTACH_IFNOT, &zErr
    );
  }

Once the container is attached to the VFS, the database handle is opened and the SQL script executed. Before the script is executed though, sqlite3_bcvfs_register_vtab() is called to make the virtual tables available to the database handle.

For the sake of brevity, the code below omits the sqlite3_exec() callback that is part of the full version.

  /* Open a database handle on a cloud database. */
  if( rc==SQLITE_OK ){
    rc = sqlite3_open_v2(zPath, &db, SQLITE_OPEN_READWRITE, "myvfs");
    if( rc!=SQLITE_OK ){
      zErr = sqlite3_mprintf("%s", sqlite3_errmsg(db));
    }
  }

  /* Enable the virtual table interface. */
  if( rc==SQLITE_OK ){
    rc = sqlite3_bcvfs_register_vtab(db);
  }

  /* Execute the provided SQL script. */
  if( rc==SQLITE_OK ){
    sqlite3_exec(db, zSql, 0, 0, &zErr);
  }

Finally, the cloudsql() function cleans up the two handles it allocated, outputs any error message and returns.

  sqlite3_close(db);
  sqlite3_bcvfs_destroy(pVfs);

  /* Output any error, free any error message and return. */
  if( rc!=SQLITE_OK ){
    fprintf(stderr, "Error: (%d) %s\n", rc, zErr);
  }
  sqlite3_free(zErr);
  return rc;
}

3.3. Secure Containers

When running in daemon mode, containers may be attached either securely or insecurly. By default, containers are attached to a daemon insecurely. This means that:

When containers are attached securely, all data stored in the cache file is encrypted using 128-bit AES OFB encryption. Local clients obtain the encryption key from the daemon process by supplying valid cloud storage credentials - a client is granted the encryption key only if it can prove that it already has permission to access the database within cloud storage. Cloud storage credentials are verified only once, when the client first connects, so it is quite possible for a client to remain connected to a daemon process and accessing a database long after its supplied credentials have expired.

A container is nominated as secure when it is first attached to a daemon, by specifying the SQLITE_BCV_ATTACH_SECURE flag along with the sqlite3_bcvfs_attach() call.

There are two known disadvantages to attaching containers securely:

4. Cloud Storage Modules and Authentication

All CBS APIs and command-line tools accept a "module" specification, which determines the cloud storage system that the system operates on and how it operates. There are two built-in modules, "azure", which works with Azure Storage, and "google", which works with Google Cloud Storage. There is also an API that may be used to add new cloud storage modules.

A module specification consists of the module name optionally followed by one or more URI style key/value parameters. For example:

  azure
  azure?sas=1
  azure?emulator=127.0.0.1:10000&sas=1

There is no default module. It must always be specified explicitly.

As well as a module specification, CBS APIs and command-line tools require a user-name and an authentication value - a string containing cloud storage authentication information. Exactly how these are interpreted depends on the module specification. The values expected by the two built-in modules, and how they are used, are described in the following two sections.

4.1 The Built-In "azure" Module

The "azure" module is used to interface with Azure Storage. The value provided as a user-name is used as the Azure user name when connecting. The authentication value may either be the corresponding account access-key (shared-key access), or an Azure Shared Access Signature token, depending on the URI style parameters provided along with the module name.

One way to generate an SAS token for Azure is to use the "az" command line tool. For example, given account $ACCOUNT with access-key $ACCESSKEY, to generate an SAS token that allows read/write access to container $CONTAINER and expires on the 15th of April, 2020:

  az storage container generate-sas -o tsv 
      --account-name $ACCOUNT 
      --account-key $ACCESSKEY 
      --expiry 2020-04-15 
      --name $CONTAINER 
      --permissions dlrw

The daemon process and some command-line commands may also use read-only SAS tokens. To generate a read-only token, the --permissions option in the command above should be passed "lr" instead of "dlrw".

The following URI style parameters may be used to modify the behaviour of the azure module:

Option Interpretation
emulator If this option is present, it indicates that CBS should connect to an Azure emulator instead of to a real Azure cloud storage account. Both the open source azurite and the legacy emulator are supported. The value of this option should be set to the "host:portnumber" address of the port on which the emulated Azure blob service is running. If using either supported emulator on the local machine with default options, this address is "127.0.0.1:10000".
sasThis option must be set to either "0" (the default) or "1". If it is set to 1, then the module expects the provided authentication value to be an SAS token. Otherwise, it assumes it to be an access-key.
customuriThis option may only be present if "sas=1" is also specified. Similarly, it may only be set to "0" (the default), or "1". If it is set to 1, then a base URI for the container must be specified in place of the account name parameter. If the specified container name is NULL or a zero-length string, then the base URI is used as is as the URI for the remote container. Otherwise, the specified container name is appended as a directory component to the base URI. For example, if the string value passed as the account name for an "customuri=1" module is:
https://photos.contoso.com

and the container name "myforms", then the URI used to access the container manifest is:

https://photos.contoso.com/myforms/manifest.bcv

Alternatively, the same container can be accessed by specifying a NULL container name and:

https://photos.contoso.com/myforms

as the custom URI value.

For example, to connect to a real Azure account using an SAS token for authentication purposes:

  azure?sas=1
To connect to an Azure blob emulator running on localhost port 10000 (the default) using an access-key for authentication, either of the following may be used:
  azure?emulator=127.0.0.1:10000
  azure?emulator=127.0.0.1:10000&sas=0
Or, to connect to a real Azure account using an access-key for authentication:
  azure
may be used.

4.2 The Built-In "google" Module

The "google" module is used to connect CBS to Google Cloud Storage. The user-name parameter should be passed the project-id for the project that owns the storage bucket or buckets to be operated on. In practice, this is only required for CBS's "create" primitive, and only then if it needs to create a new cloud storage container (bucket). In all other cases, an empty string may be specified for the CBS user-name parameter.

The authorization value parameter must be passed the text of a Google Cloud access token.

The google module does not accept any URI style parameters.

5. Details and Caveats

5.1. Concurrent Access From Multiple Clients

The system supports any number of concurrent readers using either daemonless or daemon mode. Even though no locks are taken, read transactions are not usually interfered with, even if a new version of the database is pushed by another host while they are ongoing. This is because block files are never modified after they are created; new block files are uploaded for each change. And old block files are not deleted immediately (see below), so ongoing read transactions can usually continue to access them for as long as required.

If a database is modified within the cloud storage system, the modifications do not become automatically visible to existing clients. Until the container is "polled", clients continue to see the old versions of all databases. A container may be polled either by calling the sqlite3_bcvfs_poll() API:

    int sqlite3_bcvfs_poll(sqlite3_bcvfs *pFs, const char *zCont, char **pzErr);

or the equivalent PRAGMA interface. To be clear, any changes made are immediately visible to other SQL clients using the same read-write daemonless VFS object, but are not visible to SQL clients using other VFS objects until the container has been polled.

Similarly, after changes are made to a cloud storage database via a read-write daemonless VFS object, the new version of the database is not automatically uploaded to cloud storage. In order to upload changes to cloud storage so that they are available to new clients or existing clients after they have polled the container, use the sqlite3_bcvfs_upload() API:

    int sqlite3_bcvfs_upload(
      sqlite3_bcvfs *pFs,             /* VFS handle */
      const char *zCont,              /* Container (alias) to upload databases of */
      int (*xBusy)(void*,int),        /* Busy-handler callback */
      void *pBusyArg,                 /* First argument passed to xBusy */
      char **pzErr                    /* OUT: Error message */
    );

or the equivalent PRAGMA interface.

At present, it is not possible to upload changes to a database if there have been changes made to *any* database in the container since the most recent poll operation. This is not usually a problem, the container can be polled and then the upload reattempted.

However, the poll operation will fail if there have been changes to one or more databases in cloud storage for which there are also local changes yet to be uploaded, or if there are local changes to a database that has been deleted from cloud storage. It is the responsibility of the user to avoid this situation - concurrent modifications being made to the same database at multiple sites - perhaps by using the bcv_kv virtual table.

5.2. Deleting Old Blocks

Once they have been uploaded, blocks are never modified within cloud storage. Instead, when a database is modified, new block files are uploaded for the modified regions and the manifest updated to refer to the new list of blocks. This makes managing reader/writer concurrency easier, and facilitates block sharing, but also means that as time passes a large number of unused block files may accumuate within cloud storage.

Unused block files may be removed by invoking the sqlite3_bcv_cleanup() interface from bcvutil.h:

    int sqlite3_bcv_cleanup(sqlite3_bcv *p, int nSecond);

In order to avoid disturbing existing read clients, the second parameter may be used to specify the minimum number of seconds the block must have been unused for to be considered eligible for cleanup.

5.3. Block Sharing

Sometimes, a single block may be used by multiple databases within a manifest, saving cloud storage space and bandwidth. This is known as "block sharing". Block sharing occurs in two scenarios:

The block id size used by a manifest file must be configured when the manifest is first created using the command line create command or equivalent API. If block sharing using md5 hash values is required, then the block id size should be set to 24 bytes. Otherwise, it is best left set to the default value of 16 bytes to save space.

5.4. Pragma Interface

Blockcachevfs currently supports three PRAGMA statements - "PRAGMA bcv_upload" , "PRAGMA bcv_poll" and "PRAGMA bcv_client". If further PRAGMA commands are added in the future, the names of the new commands will begin with "bcv_".

PRAGMA bcv_upload

Blockcachevfs database clients may execute the following PRAGMA statement to force any changes made locally to be uploaded to the cloud storage account:

  PRAGMA [database].bcv_upload

The PRAGMA statements does not return until the upload is complete. If successful, an empty string is returned. If an error occurs and the upload cannot be completed, an SQLite error code is returned and the error message set to an English language explanation of the error.

Invoking this PRAGMA is different to using the sqlite3_bcvfs_upload() interface in that this command only uploads changes for the current database, whereas sqlite3_bcvfs_upload() uploads all locally modified databases in the container.

PRAGMA bcv_poll

Invoking "PRAGMA bcv_poll":

  PRAGMA [database].bcv_poll

is the equivalent to calling sqlite3_bcvfs_poll() on the container that the database is a part of. If successful, an empty string is returned. Otherwise, an SQLite error code is returned and the error message set to an English language explanation of the error.

PRAGMA bcv_client

Invoking "PRAGMA bcv_client":

  PRAGMA [database].bcv_client
  PRAGMA [database].bcv_client = NAME

This PRAGMA is used to set or query a cloudsqlite database connection's client name. The configured client name appears in the "client" column of the bcv_http_log virtual table for requests associated with the current database connection. It may also appear in log messages. It is not used for any other purpose. The default client name is an empty string.

5.5. Virtual Table Interface

The virtual table interface consists of three eponymous virtual tables (or two in daemon mode). It is not available automatically, but must be made available to each SQLite connection explicitly by calling the sqlite3_bcvfs_register_vtab() API from blockcachevfs.h:

    int sqlite3_bcvfs_register_vtab(sqlite3*);

The bcv_container table:

The read-only "bcv_container" table contains one row for each container attached to the VFS. It has the equivalent of the following schema:

    CREATE TABLE bcv_container(
      name      TEXT,          -- local name (alias) of container 
      storage   TEXT,          -- cloud storage system (e.g. "azure")
      user      TEXT,          -- cloud storage username
      container TEXT,          -- container name in cloud storage
      ncleanup  INTEGER        -- number of blocks eligible for cleanup
    )

Column "name" contains the local alias of the attached container (this may be the same as the remove container name). Columns "storage", "user" and "container" contain the storage module, account and cloud storage container names as specified to sqlite3_bcvfs_attach() when the container was attached.

The "ncleanup" column usually contains the number of blocks that will be deleted from cloud storage if a cleanup operation (see the sqlite3_bcv_cleanup() API)is run on the container. However, if errors or client crashes have occurred while uploading changes to cloud storage, then there may be extra unused blocks left in the cloud storage container. In this case those blocks will be deleted by the next cleanup operation, but are not included in the value stored in the "ncleanup" column of the "bcv_container" table.

The bcv_container table is available in both daemon and daemonless mode.

The bcv_database table:

The read-only "bcv_database" table contains one row for each database in each attached container. It has the equivalent of the following schema:

    CREATE TABLE bcv_database(
      container TEXT,          -- local name (alias) of container
      database  TEXT,          -- name of database
      nblock INTEGER,          -- total number of blocks in database
      ncache INTEGER,          -- number of blocks in cache
      ndirty INTEGER,          -- number of dirty blocks in cache
      walfile BOOLEAN,         -- true if transactions in local wal file
      state TEXT,              -- state of database (see below)
      nclient INTEGER,         -- current number of clients
      nprefetch INTEGER,       -- current number of ongoing prefetches
      ntrans INTEGER           -- current number of open transactions
    )

The "container" column contains the local alias of the container (the same value that is stored in the "name" column of the bcv_container table). Column "database" contains the database name.

Columns "nblock", "ncache" and "ndirty" contain the total number of blocks in the database, the number of those blocks that are currently cached locally, and the number of those cached blocks that are dirty and require uploding. It is always true that:

    nblock >= ncache >= ndirty

Boolean column "walfile" is set to true if there are currently write transactions in a local wal file (blockcachevfs databases are always in wal mode). To determine whether or not there have been local changes to a database, an application might use a query similar to:

    SELECT (walfile OR ndirty>0) AS has_local_changes FROM bcv_database WHERE...

The "state" column usually contains an empty string. There are two exceptions:

The bcv_database table is available in both daemon and daemonless mode.

The bcv_http_log table:

The "bcv_http_log" table contains one row for each HTTP request made by the VFS or connected daemon. It has the equivalent of the following for a schema:

     CREATE TABLE bcv_http_log(
       id INTEGER,              -- Unique, monotonically increasing id value
       start_time TEXT,         -- Time request was made, as iso-8601
       end_time TEXT,           -- Time reply received, as iso-8601 (or NULL)
       method TEXT,             -- "PUT", "GET" etc.
       client TEXT,             -- Name of client that caused request
       logmsg TEXT,             -- Log message associated with request
       uri TEXT,                -- URI of request
       httpcode INTEGER         -- HTTP response code (e.g. 200)
     ) 

For those requests that can be associated with a single SQLite database handle, the contents of the "client" column is the client name as configured using the PRAGMA bcv_client command. For requests associated with a prefetch operation, it contains the string 'prefetch'.

To prevent it from consuming an ever-increasing amount of memory, entries are automatically removed from the bcv_http_log on a first-in/first-out according to the values configured for the SQLITE_BCV_HTTPLOG_TIMEOUT and SQLITE_BCV_HTTPLOG_NENTRY parameters. Or, in daemon mode, according to the values passed via the --httplogtimeout and --httplognentry command-line options.

Including various overheads, each entry may be assumed to consume 256 bytes of memory or less. So allowing 4096 entries to accumulate in the bcv_http_log table may require up to 1MiB of memory.

The bcv_stat table:

The schema of the "bcv_stat" table is:

    CREATE TABLE bcv_stat(
      name TEXT PRIMARY KEY,   -- Key value 
      value                    -- Associated payload value
    )

This read-only table contains various system "statistics" that may be of interest. The statistical value itself is found in the 'value' column, the 'name' column contains an identifier identifying the specific statistic. At present the bcv_stat table always contains three rows:

Name Value
nlock In daemonless mode, this value is always set to 0. In daemon mode, it contains the total number of cache entries that are currently in use, or "locked", by ongoing client read transactions.
ncache The current number of entries in the cache. In daemonless mode, this refers to the cache managed by the local VFS. In daemon mode, this is the cache belonging to the connected daemon.
cachesize The configured size of the cache, in entries.

The bcv_kv table:

The schema of the "bcv_kv" table is:

    CREATE TABLE bcv_kv(
      name TEXT PRIMARY KEY,   -- Key value 
      value                    -- Associated payload value
    )

Unlike the other two eponymous virtual tables described in this section, the bcv_kv table is read-write. Applications may write arbitrary data to the table to be stored in the cloud storage container. The table is empty when the container is first created.

The contents of the bcv_kv table is stored as a separate file (an SQLite database) in the cloud container - "bcv_kv.bcv".

The bcv_kv table implements transaction-like properties. As follows:

Because the entire bcv_kv table is downloaded (and, for write transactions, uploaded) for each transaction, applications should avoid storing large amounts of data within it.

The bcv_kv_meta table:

The schema of the "bcv_kv_meta" table is:

    CREATE TABLE bcv_kv_meta(
      name TEXT PRIMARY KEY,   -- Header field name
      value                    -- Header field value
    )

The bcv_kv_meta table is similar to the bcv_kv table, in that querying it causes cloudsqlite to download the bcv_kv.bcv database file and cache it for the remainder of the transaction. Instead of providing read/write access to the contents of the bcv_kv.bcv database, it provides read-only access to two of the http headers recieved from the cloud storage server when the file was downloaded - "Date" and "Last-Modified".

The bcv_kv_meta table always contains exactly two rows. The first contains the string 'date' in the "name" column, and the value of the timestamp from the HTTP "Date:" header, in ISO-8601 format, in the "value" column. The second row contains 'last-modified' in the "name" column, and the ISO-8601 equivalent of the timestamp from the HTTP "Last-Modified:" header in "value".

If, for some reason, the cloud storage server does not provide a "Date:" or "Last-Modified:" header, or if the contents of that header cannot be parsed, the corresponding "value" field in the bcv_kv_meta table is set to NULL.

6. Command Line Reference

The following options are supported by all or most of the commands enumerated below. All commands accept unique prefixes of any option in place of the full option string.

Option Argument
-userUSER-NAME

The user or account name to use when accessing cloud storage.

-authACCESS-KEY

The authentication information to use when accessing cloud storage. The form of this depends on the cloud storage module in use.

-containerCONTAINER-NAME

The blob storage container to read from or write to. This option is accepted (and required) by the commands "upload", "delete", "list" and "download" only.

-filePATH

This option tells the comand to read the local text file at relative or absolute path PATH and treat its contents as whitespace separated command line arguments.

-moduleMODULE-NAME

The cloud storage module to use to access the remote container. This may be a simple module name (e.g. "google") or a module name followed by URI style parameters (e.g. "azure?emulator=127.0.0.1:10000&sas=1").

-logSTRING

The argument to this option determines when log messages are output to stdout. If the value contains an "h" character, a message is output each time an HTTP(S) request is issued or a reply received. If a "v" is present in the string, libcurl verbose logging is enabled. All other characters are ignored. The intepretation of the argument to this option is subject to revision at any point. The default value is an empty string (no log messages at all).

The following commands are supported:

 blockcachevfsd copy ?OPTIONS? DATABASE1 DATABASE2

Make a copy of database DATABASE1 named DATABASE2. If DATABASE2 exists it is clobbered by the copy. This operation always uses block sharing to avoid uploading any new block files.

 blockcachevfsd create ?OPTIONS? CONTAINER

Create a new container and add an empty manifest file containing zero databases to it. It is not an error if the container already exists. In this case any existing manifest is clobbered by the new, empty, manifest, effectively deleting any databases currently stored in the container.

The following extra options are supported:

      -blocksize BlOCK-SIZE
      -namebytes BLOCKID-SIZE

The "-blocksize" option is used to set the size of each block file stored within cloud storage. The default size is 4MiB. If the argument is an integer value it is interpreted as a size in bytes. If an integer value followed by a 'k' or 'K' character, a size in KiB. Or, if the argument is an integer value followed by an 'm' or 'M' character, a size in MiB. The specified size must be a power of two in bytes. The following switches all serve to set the block-size to 2MiB:

      -blocksize 2M
      -blocksize 2048k
      -blocksize 2097152

The "-namebytes" option may be passed an integer value between 16 and 32, inclusive, to set the size in bytes of the block ids that will be used by the manifest file. The default value is 16. The block id size has implications for block sharing.

 blockcachevfsd delete ?OPTIONS? NAME

Remove database NAME from the container identified by the -container option.

 blockcachevfsd destroy ?OPTIONS? CONTAINER

Delete an entire blob storage container and its contents.

WARNING: Deleting a container from Azure blob storage is (apparently) a lazily executed operation. Even though the container appears to be removed immediately, it still exists in some senses as Azure prevents a new container with the same name from being created for some time. Sometimes this condition persists for up to 24 hours or longer. For this reason, it is better to "create -clobber" a container instead of attempting to delete and then recreate it.

 blockcachevfsd download ?OPTIONS? NAME ?LOCALFILE?

Download the named database from the container specified by the -container option to the local file-system. If the LOCALFILE argument is specified, then it is the local file-system path to store the downloaded database at. If no LOCALFILE argument is specified, the downloaded database is stored in the current directory using the same name as is used in cloud storage.

The download command supports the following additional option:

      -nrequest N
This option sets the number of HTTP(S) requests that are allowed to be outstanding at any one time. The default value is 1. Using a higher value might increase throughput.

 blockcachevfsd upload ?OPTIONS? DATABASE-FILE ?NAME?

Upload the specified local database file to and add it to the manifest in the container identified by the -container option. If the NAME parameter is present then it is used as the name of the uploaded remote database. Otherwise, the remote database takes the same name as the local file, with any directory components removed.

The upload command supports the following additional option:

      -nrequest N
This option sets the number of HTTP(S) requests that are allowed to be outstanding at any one time. The default value is 1. Using a higher value might increase throughput.

The eight commands above mirror the eight APIs made available to database clients (attach and detach by blockcachevfs.h, the others by bcvutil.h). The following commands are also supported, but are considered a convenience only. The command-line options and the output of the following are subject to change at any time:

 blockcachevfsd files ?OPTIONS? CONTAINER

List the files in the specified container, one per line. All files are listed, even those that are not part of blockcachevfs databases.

 blockcachevfsd list ?OPTIONS? 

If there is no -container option, list all databases in all containers in the blob storage account. Or, if the -container option is passed, list only the databases in the specified container.

 blockcachevfsd manifest ?OPTIONS? CONTAINER

Output a human-readable version of the manifest file found in the specified container.

7. Daemon Command Reference

The daemon process is started as follows:

 blockcachevfsd daemon ?OPTIONS? DIRECTORY 

The daemon command supports the following options:

Option Argument Default Value
-addrIPV4-ADDRESS(default - "127.0.0.1")

Normally, the daemon listens and accepts connections from client on the localhost address (127.0.0.1). This option allows the daemon to listen for connections on some other local address.

-cachesize CACHE-FILE-SIZE(default 1G)

Database file blocks downloaded from the blob storage account are cached locally. The cache is allowed to grow up to the size configured by this option, then blocks are expelled on a least-recently-used basis as new blocks are downloaded. The argument to this option must be a positive integer followed by a single "M" or "G" (or "m" or "g") character to indicate whether the integer value should be interpreted as a number of MiB or GiB. For example, to use a 1GiB cache, either "-cachesize 1024M" or "-cachesize 1G" may be specified.

The size specified by this option is not a hard limit. If cloudsqlite needs to download a remote block but the configure size limit has already been reached, it attempts to evict an existing block from the cache to make space for the new block. However, if this is not possible because all existing cache entries are currently in use, then the cache is allowed to grow to make room for the new block.

For the purposes of the above paragraph, a cached block is considered "in use" if any client has a read transaction open on the database to which the block belongs - unless that client is currently requesting a new block from cloud storage. This means that if there is only a single client with a read transaction open, then blocks may always be evicted from the cache as required. However, if two or more clients have a read transaction open on a single database and a new block is required, then none of that database's blocks may be evicted (they are still all "in use" by at least one of the clients) and the cache may grow beyond its configured size.

-httptimeoutSECONDS(default value 600

The number of seconds to wait before assuming an HTTPS request made to cloud storage has been lost.

-logSTRING(default value "")

The argument to this option determines when log messages are output to stdout by the daemon process. Each character in the argument string turns on different log messages. Characters not listed in the following table are ignored. The interpretation of the argument passed to this option is subject to revision at any time.

Character Interpretation
mOutput a log message each time a message is received from or sent to a connected database client.
hOutput a log message each time an HTTP(S) request is set or a reply received.
eOutput a log message when various internal "events" occur.
vTurn on libcurl CURLOPT_VERBOSE logging.

-notimestamps

This option causes the daemon process to omit the high-resolution timestamps from the log messages it outputs.

-portPORT-NUMBER(default - first free tcp port >= 22002)

The localhost port on which to listen for connections from local database clients.

-httplogtimeoutSECONDS(default value 3600)

The number of seconds after which entries are automatically removed from the bcv_http_log virtual table. A value of less than zero means entries are never removed.

-httplognentryINTEGER(default value -1)

The maximum number of entries allowed in the bcv_http_log virtual table. Once this limit has been reached old entries are automatically removed to make way for new.

-autoexit

This debugging option causes the daemon process to exit cleanly when the number of local database clients drops to zero.

-delayMS0

This debugging option causes the daemon process to to pause for the specified number of milliseconds before beginning to listen for incoming connections.

-readymessage

This debugging option causes the daemon process to print "READY\n" to stdout when it is ready to accept incoming client connections. This makes it easier to write test scripts that start and stop daemon processes.