SQLite4
Check-in [7cfa40b5c1]
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add comment describing format of row and global size records.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | matchinfo
Files: files | file ages | folders
SHA1: 7cfa40b5c1dad04fb959d418be4e76c8a584b506
User & Date: dan 2013-01-03 20:35:50
Context
2013-01-04
18:37
Allow an fts5 tokenizer to split a single document into multiple streams (i.e. sub-fields within a single column value). Modify the matchinfo APIs so that a ranking function may handle streams and/or columns separately or otherwise. check-in: f3ac136843 user: dan tags: matchinfo
2013-01-03
20:35
Add comment describing format of row and global size records. check-in: 7cfa40b5c1 user: dan tags: matchinfo
18:13
Fill in more of the matchinfo functions so that the BM25 function works. check-in: 0e439483d7 user: dan tags: matchinfo
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to src/fts5.c.

    10     10   **
    11     11   *************************************************************************
    12     12   */
    13     13   
    14     14   #include "sqliteInt.h"
    15     15   #include "vdbeInt.h"
    16     16   
           17  +/* 
           18  +** Stream numbers must be lower than this.
           19  +*/
           20  +#define SQLITE4_FTS5_NSTREAM 60
           21  +
    17     22   /*
    18         -** The global count record is a set of N varints, where N is one greater
    19         -** than the number of columns in the indexed table. The first varint
    20         -** contains the number of records in the table. Each subsequent varint
    21         -** contains the total number of tokens stored in each column.
           23  +** Records stored within the index:
    22     24   **
    23         -** The key used for the global record in the KV store is the root page 
    24         -** number of the FTS index followed by a single 0x00 byte.
           25  +** Row size record:
           26  +**   There is one "row size" record in the index for each row in the
           27  +**   indexed table. The "row size" record contains the number of tokens
           28  +**   in the associated row for each combination of a stream and column
           29  +**   number (i.e. contains the data required to find the number of
           30  +**   tokens associated with stream S present in column C of the row for
           31  +**   all S and C).
           32  +**
           33  +**   The key for the row size record is a single 0x00 byte followed by
           34  +**   a copy of the PK blob for the table row. 
           35  +**
           36  +**   The value is a series of varints. Each column of the table is
           37  +**   represented by one or more varints packed into the array.
           38  +**
           39  +**   If a column contains only stream 0 tokens, then it is represented
           40  +**   by a single varint - (nToken << 1), where nToken is the number of
           41  +**   stream 0 tokens stored in the column.
           42  +**
           43  +**   Or, if the column contains tokens from multiple streams, the first
           44  +**   varint contains a bitmask indicating which of the streams are present
           45  +**   (stored as ((bitmask << 1) | 0x01)). Following the bitmask is a
           46  +**   varint containing the number of tokens for each stream present, in
           47  +**   ascending order of stream number.
           48  +**
           49  +** Global size record:
           50  +**   There is a single "global size" record stored in the database. The
           51  +**   database key for this record is a single byte - 0x00.
           52  +**
           53  +**   The data for this record is a series of varint values. The first 
           54  +**   varint is the total number of rows in the table. The subsequent
           55  +**   varints make up a "row size" record containing the total number of
           56  +**   tokens for each S/C combination in all rows of the table.
           57  +**
           58  +** FTS index records:
           59  +**
           60  +**   The FTS index records implement the following mapping:
           61  +**
           62  +**       (token, document-pk) -> (list of instances)
    25     63   */
    26     64   
    27     65   /*
    28     66   ** Default distance value for NEAR operators.
    29     67   */
    30     68   #define FTS5_DEFAULT_NEAR 10
    31     69