Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.
|Comment:||Add comment describing format of row and global size records.|
|Downloads:||Tarball | ZIP archive | SQL archive|
|Timelines:||family | ancestors | descendants | both | matchinfo|
|Files:||files | file ages | folders|
|User & Date:||dan 2013-01-03 20:35:50|
|18:37||Allow an fts5 tokenizer to split a single document into multiple streams (i.e. sub-fields within a single column value). Modify the matchinfo APIs so that a ranking function may handle streams and/or columns separately or otherwise. check-in: f3ac136843 user: dan tags: matchinfo|
|20:35||Add comment describing format of row and global size records. check-in: 7cfa40b5c1 user: dan tags: matchinfo|
|18:13||Fill in more of the matchinfo functions so that the BM25 function works. check-in: 0e439483d7 user: dan tags: matchinfo|
Changes to src/fts5.c.
10 10 ** 11 11 ************************************************************************* 12 12 */ 13 13 14 14 #include "sqliteInt.h" 15 15 #include "vdbeInt.h" 16 16 17 +/* 18 +** Stream numbers must be lower than this. 19 +*/ 20 +#define SQLITE4_FTS5_NSTREAM 60 21 + 17 22 /* 18 -** The global count record is a set of N varints, where N is one greater 19 -** than the number of columns in the indexed table. The first varint 20 -** contains the number of records in the table. Each subsequent varint 21 -** contains the total number of tokens stored in each column. 23 +** Records stored within the index: 24 +** 25 +** Row size record: 26 +** There is one "row size" record in the index for each row in the 27 +** indexed table. The "row size" record contains the number of tokens 28 +** in the associated row for each combination of a stream and column 29 +** number (i.e. contains the data required to find the number of 30 +** tokens associated with stream S present in column C of the row for 31 +** all S and C). 32 +** 33 +** The key for the row size record is a single 0x00 byte followed by 34 +** a copy of the PK blob for the table row. 35 +** 36 +** The value is a series of varints. Each column of the table is 37 +** represented by one or more varints packed into the array. 38 +** 39 +** If a column contains only stream 0 tokens, then it is represented 40 +** by a single varint - (nToken << 1), where nToken is the number of 41 +** stream 0 tokens stored in the column. 42 +** 43 +** Or, if the column contains tokens from multiple streams, the first 44 +** varint contains a bitmask indicating which of the streams are present 45 +** (stored as ((bitmask << 1) | 0x01)). Following the bitmask is a 46 +** varint containing the number of tokens for each stream present, in 47 +** ascending order of stream number. 48 +** 49 +** Global size record: 50 +** There is a single "global size" record stored in the database. The 51 +** database key for this record is a single byte - 0x00. 52 +** 53 +** The data for this record is a series of varint values. The first 54 +** varint is the total number of rows in the table. The subsequent 55 +** varints make up a "row size" record containing the total number of 56 +** tokens for each S/C combination in all rows of the table. 57 +** 58 +** FTS index records: 59 +** 60 +** The FTS index records implement the following mapping: 22 61 ** 23 -** The key used for the global record in the KV store is the root page 24 -** number of the FTS index followed by a single 0x00 byte. 62 +** (token, document-pk) -> (list of instances) 25 63 */ 26 64 27 65 /* 28 66 ** Default distance value for NEAR operators. 29 67 */ 30 68 #define FTS5_DEFAULT_NEAR 10 31 69