/ Check-in [13a9d085]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add new file ext/fts3/README.content, describing the experimental FTS4 content option.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 13a9d085e1a5654a97b8d26bae7182ca6c0c237b
User & Date: dan 2011-10-31 11:36:29
Context
2011-10-31
12:25
Fix a typo in a comment. No code changes. check-in: 6635cd9a user: drh tags: trunk
11:36
Add new file ext/fts3/README.content, describing the experimental FTS4 content option. check-in: 13a9d085 user: dan tags: trunk
06:52
Update fts3fault.test to account for the sqlite3_errmsg() related changes in [8f88cc4e61] and [dcb7879347]. check-in: 3f2d49c6 user: dan tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Added ext/fts3/README.content.

            1  +
            2  +FTS4 CONTENT OPTION
            3  +
            4  +  Normally, in order to create a full-text index on a dataset, the FTS4 
            5  +  module stores a copy of all indexed documents in a specially created 
            6  +  database table.
            7  +
            8  +  As of SQLite version 3.7.9, FTS4 supports a new option - "content" -
            9  +  designed to extend FTS4 to support the creation of full-text indexes where:
           10  +
           11  +    * The indexed documents are not stored within the SQLite database 
           12  +      at all (a "contentless" FTS4 table), or
           13  +
           14  +    * The indexed documents are stored in a database table created and
           15  +      managed by the user (an "external content" FTS4 table).
           16  +
           17  +  Because the indexed documents themselves are usually much larger than 
           18  +  the full-text index, the content option can sometimes be used to achieve 
           19  +  significant space savings.
           20  +
           21  +CONTENTLESS FTS4 TABLES
           22  +
           23  +  In order to create an FTS4 table that does not store a copy of the indexed
           24  +  documents at all, the content option should be set to an empty string.
           25  +  For example, the following SQL creates such an FTS4 table with three
           26  +  columns - "a", "b", and "c":
           27  +
           28  +    CREATE VIRTUAL TABLE t1 USING fts4(content="", a, b, c);
           29  +
           30  +  Data can be inserted into such an FTS4 table using an INSERT statements.
           31  +  However, unlike ordinary FTS4 tables, the user must supply an explicit
           32  +  integer docid value. For example:
           33  +
           34  +    -- This statement is Ok:
           35  +    INSERT INTO t1(docid, a, b, c) VALUES(1, 'a b c', 'd e f', 'g h i');
           36  +
           37  +    -- This statement causes an error, as no docid value has been provided:
           38  +    INSERT INTO t1(a, b, c) VALUES('j k l', 'm n o', 'p q r');
           39  +
           40  +  It is not possible to UPDATE or DELETE a row stored in a contentless FTS4
           41  +  table. Attempting to do so is an error.
           42  +
           43  +  Contentless FTS4 tables also support SELECT statements. However, it is
           44  +  an error to attempt to retrieve the value of any table column other than
           45  +  the docid column. The auxiliary function matchinfo() may be used, but
           46  +  snippet() and offsets() may not. For example:
           47  +
           48  +    -- The following statements are Ok:
           49  +    SELECT docid FROM t1 WHERE t1 MATCH 'xxx';
           50  +    SELECT docid FROM t1 WHERE a MATCH 'xxx';
           51  +    SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'xxx';
           52  +
           53  +    -- The following statements all cause errors, as the value of columns
           54  +    -- other than docid are required to evaluate them.
           55  +    SELECT * FROM t1;
           56  +    SELECT a, b FROM t1 WHERE t1 MATCH 'xxx';
           57  +    SELECT docid FROM t1 WHERE a LIKE 'xxx%';
           58  +    SELECT snippet(t1) FROM t1 WHERE t1 MATCH 'xxx';
           59  +
           60  +  Errors related to attempting to retrieve column values other than docid
           61  +  are runtime errors that occur within sqlite3_step(). In some cases, for
           62  +  example if the MATCH expression in a SELECT query matches zero rows, there
           63  +  may be no error at all even if a statement does refer to column values 
           64  +  other than docid.
           65  +
           66  +EXTERNAL CONTENT FTS4 TABLES
           67  +
           68  +  An "external content" FTS4 table is similar to a contentless table, except
           69  +  that if evaluation of a query requires the value of a column other than 
           70  +  docid, FTS4 attempts to retrieve that value from a table (or view, or 
           71  +  virtual table) nominated by the user (hereafter referred to as the "content
           72  +  table"). The FTS4 module never writes to the content table, and writing
           73  +  to the content table does not affect the full-text index. It is the
           74  +  responsibility of the user to ensure that the content table and the 
           75  +  full-text index are consistent.
           76  +
           77  +  An external content FTS4 table is created by setting the content option
           78  +  to the name of a table (or view, or virtual table) that may be queried by
           79  +  FTS4 to retrieve column values when required. If the nominated table does
           80  +  not exist, then an external content table behaves in the same way as
           81  +  a contentless table. For example:
           82  +
           83  +    CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c);
           84  +    CREATE VIRTUAL TABLE t3 USING fts4(content="t2", a, c);
           85  +
           86  +  Assuming the nominated table does exist, then its columns must be the same 
           87  +  as or a superset of those defined for the FTS table.
           88  +
           89  +  When a users query on the FTS table requires a column value other than
           90  +  docid, FTS attempts to read this value from the corresponding column of
           91  +  the row in the content table with a rowid value equal to the current FTS
           92  +  docid. Or, if such a row cannot be found in the content table, a NULL
           93  +  value is used instead. For example:
           94  +
           95  +    CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c, d);
           96  +    CREATE VIRTUAL TABLE t3 USING fts4(content="t2", b, c);
           97  +  
           98  +    INSERT INTO t2 VALUES(2, 'a b', 'c d', 'e f');
           99  +    INSERT INTO t2 VALUES(3, 'g h', 'i j', 'k l');
          100  +    INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2;
          101  +
          102  +    -- The following query returns a single row with two columns containing
          103  +    -- the text values "i j" and "k l".
          104  +    --
          105  +    -- The query uses the full-text index to discover that the MATCH 
          106  +    -- term matches the row with docid=3. It then retrieves the values
          107  +    -- of columns b and c from the row with rowid=3 in the content table
          108  +    -- to return.
          109  +    --
          110  +    SELECT * FROM t3 WHERE t3 MATCH 'k';
          111  +
          112  +    -- Following the UPDATE, the query still returns a single row, this
          113  +    -- time containing the text values "xxx" and "yyy". This is because the
          114  +    -- full-text index still indicates that the row with docid=3 matches
          115  +    -- the FTS4 query 'k', even though the documents stored in the content
          116  +    -- table have been modified.
          117  +    --
          118  +    UPDATE t2 SET b = 'xxx', c = 'yyy' WHERE rowid = 3;
          119  +    SELECT * FROM t3 WHERE t3 MATCH 'k';
          120  +
          121  +    -- Following the DELETE below, the query returns one row containing two
          122  +    -- NULL values. NULL values are returned because FTS is unable to find
          123  +    -- a row with rowid=3 within the content table.
          124  +    --
          125  +    DELETE FROM t2;
          126  +    SELECT * FROM t3 WHERE t3 MATCH 'k';
          127  +
          128  +  When a row is deleted from an external content FTS4 table, FTS4 needs to
          129  +  retrieve the column values of the row being deleted from the content table.
          130  +  This is so that FTS4 can update the full-text index entries for each token
          131  +  that occurs within the deleted row to indicate that that row has been 
          132  +  deleted. If the content table row cannot be found, or if it contains values
          133  +  inconsistent with the contents of the FTS index, the results can be difficult
          134  +  to predict. The FTS index may be left containing entries corresponding to the
          135  +  deleted row, which can lead to seemingly nonsensical results being returned
          136  +  by subsequent SELECT queries. The same applies when a row is updated, as
          137  +  internally an UPDATE is the same as a DELETE followed by an INSERT.
          138  +  
          139  +  Instead of writing separately to the full-text index and the content table,
          140  +  some users may wish to use database triggers to keep the full-text index
          141  +  up to date with respect to the set of documents stored in the content table.
          142  +  For example, using the tables from earlier examples:
          143  +
          144  +    CREATE TRIGGER t2_bu BEFORE UPDATE ON t2 BEGIN
          145  +      DELETE FROM t3 WHERE docid=old.rowid;
          146  +    END;
          147  +    CREATE TRIGGER t2_bd BEFORE DELETE ON t2 BEGIN
          148  +      DELETE FROM t3 WHERE docid=old.rowid;
          149  +    END;
          150  +
          151  +    CREATE TRIGGER t2_bu AFTER UPDATE ON t2 BEGIN
          152  +      INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c);
          153  +    END;
          154  +    CREATE TRIGGER t2_bd AFTER INSERT ON t2 BEGIN
          155  +      INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c);
          156  +    END;
          157  +
          158  +  The DELETE trigger must be fired before the actual delete takes place
          159  +  on the content table. This is so that FTS4 can still retrieve the original
          160  +  values in order to update the full-text index. And the INSERT trigger must
          161  +  be fired after the new row is inserted, so as to handle the case where the
          162  +  rowid is assigned automatically within the system. The UPDATE trigger must
          163  +  be split into two parts, one fired before and one after the update of the
          164  +  content table, for the same reasons.
          165  +
          166  +  FTS4 features a special command similar to the 'optimize' command that
          167  +  deletes the entire full-text index and rebuilds it based on the current
          168  +  set of documents in the content table. Assuming again that "t3" is the
          169  +  name of the external content FTS4 table, the command is:
          170  +
          171  +    INSERT INTO t3(t3) VALUES('rebuild');
          172  +
          173  +  This command may also be used with ordinary FTS4 tables, although it may
          174  +  only be useful if the full-text index has somehow become corrupt. It is an
          175  +  error to attempt to rebuild the full-text index maintained by a contentless
          176  +  FTS4 table.
          177  +
          178  +

Changes to test/fts4content.test.

    34     34   #   4.* - The "INSERT INTO fts(fts) VALUES('rebuild')" command.
    35     35   #
    36     36   #   5.* - Check that CREATE TABLE, DROP TABLE and ALTER TABLE correctly
    37     37   #         ignore any %_content table when used with the content=xxx option.
    38     38   #
    39     39   #   6.* - Test the effects of messing with the schema of table xxx after
    40     40   #         creating a content=xxx FTS index.
           41  +#   
           42  +#   7.* - Test that if content=xxx is specified and table xxx does not
           43  +#         exist, the FTS table can still be used for INSERT and some
           44  +#         SELECT statements.
    41     45   #
    42     46   
    43     47   do_execsql_test 1.1.1 {
    44     48     CREATE TABLE t1(a, b, c);
    45     49     INSERT INTO t1 VALUES('w x', 'x y', 'y z');
    46     50     CREATE VIRTUAL TABLE ft1 USING fts4(content=t1);
    47     51   }
................................................................................
   471    475     INSERT INTO ft8(docid, x) VALUES(17, 'I Y T Q O');
   472    476   }
   473    477   
   474    478   do_execsql_test 7.1.2 {
   475    479     SELECT docid FROM ft8 WHERE ft8 MATCH 'N';
   476    480   } {13 15}
   477    481   
          482  +do_execsql_test 7.2.1 {
          483  +  CREATE VIRTUAL TABLE ft9 USING fts4(content=, x);
          484  +  INSERT INTO ft9(docid, x) VALUES(13, 'U O N X G');
          485  +  INSERT INTO ft9(docid, x) VALUES(14, 'C J J U B');
          486  +  INSERT INTO ft9(docid, x) VALUES(15, 'N J Y G X');
          487  +  INSERT INTO ft9(docid, x) VALUES(16, 'R Y D O R');
          488  +  INSERT INTO ft9(docid, x) VALUES(17, 'I Y T Q O');
          489  +}
          490  +do_execsql_test 7.2.2 {
          491  +  SELECT docid FROM ft9 WHERE ft9 MATCH 'N';
          492  +} {13 15}
          493  +do_execsql_test 7.2.3 {
          494  +  SELECT name FROM sqlite_master WHERE name LIKE 'ft9_%';
          495  +} {ft9_segments ft9_segdir ft9_docsize ft9_stat}
          496  +
          497  +do_catchsql_test 7.2.4 {
          498  +  SELECT * FROM ft9 WHERE ft9 MATCH 'N';
          499  +} {1 {SQL logic error or missing database}}
          500  +
   478    501   finish_test