hctree

Documentation
Login

Documentation

1. Overall Structure

Each database consists of at least three files:

Additionally, there may be one or more client log files.

2. Header Page Format

The database header consists of two pages using the same format.

Byte Offset Size Description
0 32 Magic string identifying an hctree database - "hctree version 01 database file" followed by a nul-terminator 0x00 byte.
32 8 Checksum for header page.
40 4 Constant Value 1234. To identify endianness of database.
44 4 Page size in bytes.
48 8 Guaranteed safe recovery transaction id.

3. Page Map Format

Page-map slots 3-32, inclusive, are not used for a logical-to-physical page mapping. Instead, the lower 56-bits of these slots are used as follows:

SlotUse
3 Largest logical page number allocated in database.
4 Largest physical page number allocated in database.
5 Largest transaction id allocated so far.
6 Largest commit id allocated so far.

4. Database File Format

Both logical and physical database pages are numbered starting from 1. Each database page is pgsz bytes in size, where pgsz is determined by the value stored in the header file.

There are 5 types of database page:

All pages begin with an 8 byte page header:

Byte Offset Size Description
0 1 Least-signicant nibble stores the page type:
  • 0x01 - Intkey page.
  • 0x03 - Index pages.
  • 0x05 - Overflow page.
  • 0x06 - History fan page.

Other nibble contains flags:

  • 0x80 - HCT_PAGETYPE_LEFTMOST (set for leftmost page in each list)
1 1 Distance between list and leaves. For leaves, 0, for parents of leaves, 1, and so on)
2 2 Number of entries on page (nEntry below).
4 4 Peer page number.

4.1. Intkey Leaf Page Format

Intkey leaf page format:

page hdr leaf hdr entry array free space record area 8 bytes 16 bytes nEntry x 16 bytes
boxht=0.2 ; boxwid=0.6 ; linewid=0.2

# total width is 7.5
PGHDR: box "page hdr"        wid 0.8
LEHDR: box "leaf hdr"        wid 1.0
ARRAY:box "entry array"      wid 1.2
box "free space"             wid 1.5
box "record area"            wid 3.0

text "8 bytes" at PGHDR - (0, 0.2)
text "16 bytes" at LEHDR - (0, 0.2)
text "nEntry x 16 bytes" at ARRAY - (0, 0.2)

Following the page header, the 16 byte leaf header:

Byte Offset Size Description
8 2 number of bytes of free space between the end of header array and the first entry within the record area (the region marked "free space" in the diagram above).
10 2 number of bytes of free space (in total) within record area.
12 4 Unused.

Following the free-space header, an array of nEntry 16-byte entries. Each 16-byte entry is comprised of the following:

Each record within the record area consists of:

The part of the record stored on the main page is calculated using:

        nMax  = pgsz - 8 - 16 - 16 - 20;    /* max record that fits on page */
        nMaxO = nMax - 4;                   /* max record with ovfl that fits */
        nMin  = pgsz / 8;                   /* min prefix to store on page */

        if( nRecord <= nMax ){
          nLocal = nRecord;
        }else{ 
          nLocal = nRecord % (pgsz-8);
          if( nLocal < nMin || nLocal > nMaxO ){
            nLocal = nMin;
          }
        }

4.2. Intkey Internal Node Format

page hdr entry array free space 8 bytes nEntry x 16 bytes
boxht=0.2 ; boxwid=0.6 ; linewid=0.2

# total width is 7.5
PGHDR: box "page hdr"        wid 0.8
ARRAY:box "entry array"      wid 4.2
box "free space"             wid 2.5

text "8 bytes" at PGHDR - (0, 0.2)
text "nEntry x 16 bytes" at ARRAY - (0, 0.2)

Each entry on an intkey internal node page consists of an 8 byte rowid followed by a 4 byte logical page-id, and is padded out to 16 bytes. The maximum number of entries on an intkey internal node page is therefore:

        nMax = ((pgsz-8) / 16)

4.3. Index Leaf Page Format

page hdr leaf hdr entry array free space record area 8 bytes 24 bytes nEntry x 8 bytes
boxht=0.2 ; boxwid=0.6 ; linewid=0.2

# total width is 7.5
PGHDR: box "page hdr"        wid 0.8
LEHDR: box "leaf hdr"        wid 1.0
ARRAY:box "entry array"      wid 1.2
box "free space"             wid 1.5
box "record area"            wid 3.0

text "8 bytes" at PGHDR - (0, 0.2)
text "24 bytes" at LEHDR - (0, 0.2)
text "nEntry x 8 bytes" at ARRAY - (0, 0.2)

Index leaf pages have a 24 byte leaf header immediately following the page header. See the intkey leaf page format section for details.

Immediately following the free-space header is an nEntry entry array of 8-byte entries. Each entry consists of:

Each record within the record area consist of:

just as those on intkey leaf pages do. The part of the record stored on the main page is calculated using:

        if( nRecord <= (pgsz-16-8-12) ){
          nLocal = nRecord;
        }else{ 
          nLocal = nRecord % (pgsz-8);
          if( nLocal < (pgsz/16) || nLocal > (pgsz-16-8-16) ){
            nLocal = pgsz/16;
          }
        }

4.4. Index Internal Node Page Format

page hdr i-n hdr entry array free space record area 8 bytes 4 bytes nEntry x 12 bytes
boxht=0.2 ; boxwid=0.6 ; linewid=0.2

# total width is 7.5
PGHDR: box "page hdr"        wid 0.8
INHDR: box "i-n hdr"         wid 0.6
ARRAY:box "entry array"      wid 1.2
box "free space"             wid 1.5
box "record area"            wid 3.4

text "8 bytes" at PGHDR - (0, 0.2)
text "4 bytes" at INHDR - (0, 0.2)
text "nEntry x 12 bytes" at ARRAY - (0, 0.2)

Byte Offset Size Description
8 2 number of bytes of free space between the end of header array and the first entry within the record area (the region marked "free space" in the diagram above).
10 2 number of bytes of free space (in total) within record area.

As for index leaf pages, except that the nEntry array consists of 12-byte entries:

4.5. Overflow Chain Format

page hdr data free space 8 bytes nEntry bytes
boxht=0.2 ; boxwid=0.6 ; linewid=0.2

PGHDR: box "page hdr"        wid 0.8
DATA: box "data"             wid 4.7
box "free space"             wid 2.0

text "8 bytes" at PGHDR - (0, 0.2)
text "nEntry bytes" at DATA - (0, 0.2)

As in SQLite, overflow pages are a linked list, starting with the page identified in the b-tree page record. The number of overflow pages in the linked list, and the amount of data on the last of them, may be inferred from the record-size field on the b-tree page. The amount of data on each page, in bytes, is also stored in the nEntry field of the overflow page header.

Overflow pages are identified by physical page only - both the page number embedded in the start of the record within the tree structure page and the "next page" page number on overflow pages themselves are physical, not logical, page numbers

4.6. History Fan Format

page hdr fan page hdr entry array free space 8 bytes 24 bytes nEntry*4 bytes
boxht=0.2 ; boxwid=0.6 ; linewid=0.2

PGHDR: box "page hdr"        wid 0.8
FANHDR: box "fan page hdr"   wid 2.4
DATA: box "entry array"      wid 3.0
box "free space"             wid 1.3

text "8 bytes" at PGHDR - (0, 0.2)
text "24 bytes" at FANHDR - (0, 0.2)
text "nEntry*4 bytes" at DATA - (0, 0.2)

A history fan page consists of the page header followed by a fan page header followed by nEntry 4-byte physical page numbers.

The 24-byte fan page header consists of

5. Log File Format

Each valid log file begins with a log file header consisting of:

Following this there are a series of records. Each record consists of:

The log is terminated by a 4-byte zero value.

When a client commits a transaction, it first writes the log file with the TID value set to zero. Once it has obtained the TID, it writes it into the log file and then sets about writing to the database. Once the transaction is done - committed or rolled back - the TID field in the log file is set to zero to invalidate it.