SQLite

All files named ”ext/fts2/fts2.c”
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

History for ext/fts2/fts2.c

2023-01-14
19:53
Deleted: Omit the long-disused FTS1 and FTS2 implements from the active source tree. The code will persist forever in the source repository, but there is no point in carrying it around in the latest tarballs where it is never used. (check-in: [2bb50d5aed] user: drh branch: trunk, size: 0)
2020-07-29
16:18
[56701939a6] part of check-in [a80ae2c98b] Dozens and dozens of typo fixes in comments. This change adds no value to the end product and is disruptive, so it is questionable whether or not it will ever land on trunk. (check-in: [a80ae2c98b] user: drh branch: typos, size: 220048)
2013-07-04
23:53
[72c816a9ae] part of check-in [f2ab874782] Modify several extensions to use the new exported function naming. Fix some shared library compilation issues. (check-in: [f2ab874782] user: mistachkin branch: extRefactor, size: 220043)
2013-03-21
21:20
[b48cc0bb65] part of check-in [6f6e2d5094] Many spelling fixes in comments. No changes to code. (check-in: [6f6e2d5094] user: mistachkin branch: trunk, size: 220005)
2012-08-25
10:01
[4ef7d7ecf5] part of check-in [9b19b84753] Fix all known instances of 'repeated the' style typos in comments. No changes to code. (check-in: [9b19b84753] user: mistachkin branch: trunk, size: 220005)
2010-09-17
01:07
[238e9e1915] part of check-in [876845661a] Completely remove all trace of ctype.h from FTS2. (check-in: [876845661a] user: drh branch: trunk, size: 220009)
2009-03-05
04:20
[6cbd0fbdfe] part of check-in [6404afa0c5] Corrected typos and misspellings. Ticket #3702. (CVS 6336) (check-in: [6404afa0c5] user: shane branch: trunk, size: 219950)
2008-07-29
20:38
[bc78da5764] part of check-in [02870ed21d] Backport http://www.sqlite.org/cvstrac/chngview?cn=5489 from fts3. Re-used prepared statement from fts2 cursor. (CVS 5499) (check-in: [02870ed21d] user: shess branch: trunk, size: 219948)
2008-07-22
23:54
[5f6f8fa8f7] part of check-in [311aeb9c2b] Be a bit more susicious of invalid results from the tokenizer. Backports check-in (4514) from fts3. (CVS 5459) (check-in: [311aeb9c2b] user: shess branch: trunk, size: 219164)
23:49
[ff1d7646d4] part of check-in [c16900dc76] Implement optimize() function. Backports check-in (5417) from fts3. (CVS 5458) (check-in: [c16900dc76] user: shess branch: trunk, size: 219060)
23:41
[c0d287669e] part of check-in [4c98179be2] Delete all fts2 index data the table becomes empty. Backports check-in (5413) from fts3. (CVS 5457) (check-in: [4c98179be2] user: shess branch: trunk, size: 208065)
23:32
[af6d11365c] part of check-in [4e47394be9] fts2 functions for testing scripts. Backports (5340) from fts3. (CVS 5456) (check-in: [4e47394be9] user: shess branch: trunk, size: 206294)
23:08
[7a2e88d110] part of check-in [3f614453d2] Change prefix search from O(N*M) to O(NlogM). Backports (4599) from fts3. (CVS 5455) (check-in: [3f614453d2] user: shess branch: trunk, size: 192085)
22:57
[f50c7faca7] part of check-in [ecf2dec66c] Changes fts2 to use only sqlite3_malloc() and not system malloc. Backports (4554) and (4555) from fts3. (CVS 5454) (check-in: [ecf2dec66c] user: shess branch: trunk, size: 188568)
2008-04-12
13:06
[015d44a43d] part of check-in [062bf5d44d] Remove all instances of sprintf() from the FTS modules. Ticket #3049. (CVS 4996) (check-in: [062bf5d44d] user: drh branch: trunk, size: 188071)
2007-12-13
21:54
[cdbace1caf] part of check-in [4e91a267fe] Change all instances of "it's" in comments to either "its" or "it is", as appropriate, in case the comments are ever again read by a pedantic grammarian. Ticket #2840. (CVS 4629) (check-in: [4e91a267fe] user: drh branch: trunk, size: 188046)
2007-11-23
18:06
[0f978f0c3b] part of check-in [f94cdcfd11] Do not require SQLITE_ENABLE_BROKEN_FTS2 if FTS2 is not enabled. The same for FTS1. Ticket #2777. (CVS 4556) (check-in: [f94cdcfd11] user: drh branch: trunk, size: 188040)
2007-11-16
00:23
[9c7d635a3d] part of check-in [75cb46f82a] Don't do anything when input doclists are both empty. Ticket #2774 (CVS 4546) (check-in: [75cb46f82a] user: shess branch: trunk, size: 187966)
2007-09-13
18:16
[02720dd6e3] part of check-in [fec6567a0f] Drop the forced error from fts3.c and add forced errors to fts2.c and fts1.c. (CVS 4427) (check-in: [fec6567a0f] user: shess branch: trunk, size: 187935)
2007-08-28
20:36
[9a02a0db89] part of check-in [6c617bd89f] Fix memory leak of InteriorReader.term. Comes up when doing queries against large segments. (CVS 4315) (check-in: [6c617bd89f] user: shess branch: trunk, size: 186866)
2007-08-10
23:47
[29992419e8] part of check-in [16730cb137] Convert fts2 to use sqlite3_prepare_v2() to prevent certain logic errors around SQLITE_SCHEMA handling. This also allows sql_step_statement() and sql_step_leaf_statement() to be replaced with sqlite3_step().

Also fix a logic error in flushPendingTerms() which was clearing the term table in case of error. This was wrong in the face of SQLITE_SCHEMA. Even though the change to sqlite3_prepare_v2() should cause us not to see SQLITE_SCHEMA any longer, it was still a logic error... (CVS 4205) (check-in: [16730cb137] user: shess branch: trunk, size: 186829)

2007-08-05
23:52
[412242297d] part of check-in [6cc15409ad] Fix some compiler warnings. (CVS 4196) (check-in: [6cc15409ad] user: drh branch: trunk, size: 189405)
2007-07-30
18:55
[6d7f854625] part of check-in [3f9a666143] Fix ticket #2439: the FTS1 and FTS2 extensions use the non-standard, unportable and highly deprecated <malloc.h> header on all platforms except Apple Mac OS X. The <malloc.h> actually is never required on any OS with an at least partly POSIX-conforming API as the malloc(3) & friends functions officially live in <stdlib.h> since over 10 years. Under some platform like FreeBSD the inclusion of <malloc.h> since a few years even causes an "#error" and this way a build failure. So, just get rid of the bad <malloc.h> usage in FTS1 and FTS2 extensions at all and stick with <stdlib.h> there only. (CVS 4191) (check-in: [3f9a666143] user: rse branch: trunk, size: 189504)
2007-07-02
10:16
[41a63f6e37] part of check-in [dee1a0fd28] Modify handling of SQLITE_SCHEMA in fts2 code. An SQLITE_SCHEMA error may cause SQLite to reload the internal schema, deleting and recreating v-table objects. So the sqlite3_vtab structure can be deleted out from under a v-table implementation. (CVS 4151) (check-in: [dee1a0fd28] user: danielk1977 branch: trunk, size: 189555)
2007-06-27
16:26
[4a177d5f29] part of check-in [488474fde7] Implement xRename() for fts2 so that it is possible to rename fts2 tables. (CVS 4143) (check-in: [488474fde7] user: danielk1977 branch: trunk, size: 189649)
2007-06-26
10:56
[1e1b6b6e83] part of check-in [bbdcf372c6] Remove the unused EXTSRC variable from the non-configure makefile. (CVS 4129) (check-in: [bbdcf372c6] user: danielk1977 branch: trunk, size: 189028)
2007-06-25
13:50
[e6015f3a98] part of check-in [c795e6fd8f] Put #ifdefs in fts2_tokenizer so that the build works even when FTS2 is omitted. Add the SQLite blessing to the header comments on all FTS2 source files. (CVS 4120) (check-in: [c795e6fd8f] user: drh branch: trunk, size: 188960)
12:49
[d402141bd3] part of check-in [3be2a6d1c3] All the use of MySQL-style quoting in the FTS modules. Ticket #2446. (CVS 4119) (check-in: [3be2a6d1c3] user: drh branch: trunk, size: 188642)
2007-06-22
15:21
[841766f2f1] part of check-in [68677e420c] Extend fts2 so that user defined tokenizers may be added. Add a tokenizer that uses the ICU library if available. Documentation and tests to come. (CVS 4108) (check-in: [68677e420c] user: danielk1977 branch: trunk, size: 188628)
2007-06-20
06:23
[8f9bd5fce1] part of check-in [fec56ad2ed] Fix snippet generation when the left-most column of an fts2 table is used in the MATCH clause. Fix for ticket #2429. (CVS 4095) (check-in: [fec56ad2ed] user: danielk1977 branch: trunk, size: 185869)
2007-06-12
18:20
[b058569b8b] part of check-in [6953cd0935] Minor comment edits from my prefix development client. No code changes. (CVS 4058) (check-in: [6953cd0935] user: shess branch: trunk, size: 185850)
2007-05-21
21:59
[4c68ff4f2c] part of check-in [ed3a131f1d] Fix overzealous fts2 assertions WRT rowid 0 or lower. Only check that docids are ascending if there was a prior docid set for the doclist, ignore the initial docid of 0. (CVS 4026) (check-in: [ed3a131f1d] user: shess branch: trunk, size: 184751)
2007-05-01
18:25
[9e1f5942fc] part of check-in [7c4c659240] Enable prefix-search in query-parsing and snippet generation. If the character immediately after the end of a term is '*', that term is marked for prefix matching. Modify term comparison in snippetOffsetsOfColumn() to respect isPrefix. fts2n.test runs prefix searching through some obvious test cases. (CVS 3893) (check-in: [7c4c659240] user: shess branch: trunk, size: 184426)
17:14
[a6762b7a6c] part of check-in [72c7963073] Modify loadSegmentLeavesInt() to correctly handle prefix searching. The new function docListUnion() is used to accumulate a union of the hits for the matching terms, which will be merged across segments using docListMerge(). (CVS 3891) (check-in: [72c7963073] user: shess branch: trunk, size: 184206)
2007-04-30
22:09
[c750b2db62] part of check-in [cae844a01a] Propagate prefix flag through implementation of doclist query code. Also implement correct prefix-handling for traversal of interior nodes of segment tree. A given prefix can span multiple children of an interior node, and from there the branches need to be followed in parallel. (CVS 3889) (check-in: [cae844a01a] user: shess branch: trunk, size: 180043)
17:52
[cb7ca4e320] part of check-in [7ddb826689] Lift docListMerge() call out of loadSegmentLeavesInt() for prefix search. Doclists from multiple prefix matches will need a union merge function, which will have to logically happen across a segment before doclists are merged between segments. (CVS 3887) (check-in: [7ddb826689] user: shess branch: trunk, size: 178037)
2007-04-27
22:02
[b45e07a236] part of check-in [9466367d65] Break interior-node and leaf-node readers apart in loadSegment(). Previously, the code looped until the block was a leaf node as indicated by a leading NUL. Now the code loops until it finds a block in the range of leaf nodes for this segment, then reads it using LeavesReader. This will make it easier to traverse a range of leaves when doing a prefix search. (CVS 3884) (check-in: [9466367d65] user: shess branch: trunk, size: 177148)
21:24
[c1e7528d9f] part of check-in [25935db738] Lift code to traverse interior nodes out of loadSegment(). Refactoring towards prefix searching. (CVS 3882) (check-in: [25935db738] user: shess branch: trunk, size: 174858)
21:02
[430ef1093f] part of check-in [22ffdae4b6] Refactor fts2 loadSegmentLeaf() in preparation for prefix-searching. Prefix-searching will want to accumulate data across multiple leaves in the segment, using LeavesReader instead of LeafReader is the first step in that direction. (CVS 3881) (check-in: [22ffdae4b6] user: shess branch: trunk, size: 174678)
2007-04-19
18:36
[dd35df80f4] part of check-in [dfac6082e8] Fix bug in fts2 handling of OR queries. When one doclist ends before the other, the code potentially tries to read past the end of the doclist. http://www.sqlite.org/cvstrac/tktview?tn=2309 (CVS 3862) (check-in: [dfac6082e8] user: shess branch: trunk, size: 173978)
2007-04-09
20:45
[acfce1c936] part of check-in [81be7290a4] Fix crash in delete when existing row has null fields. Previous code assumed that the row had values in all columns, sigh. Fixes bug http://www.sqlite.org/cvstrac/tktview?tn=2289 . (CVS 3833) (check-in: [81be7290a4] user: shess branch: trunk, size: 173824)
2007-03-29
18:41
[8d69d6e4b4] part of check-in [0229cba696] Buffer updates per-transaction rather than per-update. If lots of updates happen within a single transaction, there was a lot of wasted encode/decode overhead due to segment merges. This code buffers updates in memory and writes out larger level-0 segments. It only works when documents are presented in ascending order by docid. Comparing a test set running 100 documents per transaction, the total runtime is cut almost in half. (CVS 3751) (check-in: [0229cba696] user: shess branch: trunk, size: 173657)
16:30
[2e3cb46d28] part of check-in [f6c3abdc6c] Don't call ctype functions on hi-bit chars. Some platforms raise assertions when this occurs, and it's almost certainly not the right thing to do in the first place. (CVS 3746) (check-in: [f6c3abdc6c] user: shess branch: trunk, size: 168737)
2007-03-22
00:14
[de8321a2ad] part of check-in [d04fa3a13a] Refactor PLWriter to remove owned buffer. DLCollector (Document List Collector) now handles the case where PLWriter (Position List Writer) needed a local buffer. Change to using the associated DLWriter (Document List Writer) buffer, which reduces the number of memory copies needed in doclist processing, and brings PLWriter operation in line with DLWriter operation. (CVS 3707) (check-in: [d04fa3a13a] user: shess branch: trunk, size: 168105)
2007-03-20
23:52
[aba63e7f48] part of check-in [1b9918e207] Refactor PLWriter in preparation for buffered-document change. Currently, PLWriter (Position List Writer) creates a locally-owned DataBuffer to write into. This is necessary to support doclist collection during tokenization, where there is no obvious buffer to write output to, but is not necessary for the other users of PLWriter. This change adds a DLCollector (Doc List Collector) structure to handle the tokenization case.

Also fix a potential memory leak in writeZeroSegment(). In case of error from leafWriterStep(), the DataBuffer dl was being leaked. (CVS 3706) (check-in: [1b9918e207] user: shess branch: trunk, size: 166906)

2007-02-07
01:01
[a49ed7292c] part of check-in [283385d207] http://www.sqlite.org/cvstrac/tktview?tn=2219

When creating fts tables in an attached database, the backing tables are created in database 'main'. This change propagates the appropriate database name to the routines which build sql statements.

Note that I propagate the database name and table name separately. I briefly considered just making the table name be "db.table", but it didn't fit so well in the model used to store the table name and other information, and having the db name passed separately seemed a bit more transparent. (CVS 3631) (check-in: [283385d207] user: shess branch: trunk, size: 166020)

2007-01-19
22:59
[5f7247b8ec] part of check-in [4f2ab4b632] http://www.sqlite.org/cvstrac/tktview?tn=2166,35

Calling UPDATE against an fts table in a UTF-16 database inserts corrupted data into the database. The UTF-8 data is being inserted directly. This appears to happen because sqlite3_ value_text() destructively coerces a value to UTF-8, and it's never converted back when updating the table. This works around the problem by rearranging things so that the update happens before the coercion. (CVS 3596) (check-in: [4f2ab4b632] user: shess branch: trunk, size: 165480)

2006-11-29
23:41
[5424f41fbc] part of check-in [08c2cc0e07] Drop a couple variables which are no longer used anywhere. (CVS 3524) (check-in: [08c2cc0e07] user: shess branch: trunk, size: 165480)
05:17
[94b4384807] part of check-in [18142fdb6d] http://www.sqlite.org/cvstrac/tktview?tn=2046

The virtual table interface allows for a cursor to field multiple xFilter() calls. For instance, if a join is done with a virtual table, there could be a call for each row which potentially matches. Unfortunately, fulltextFilter() assumes that it has a fresh cursor, and overwrites a prepared statement and a malloc'ed pointer, resulting in unfinalized statements and a memory leak.

This change hacks the code to manually clean up offending items in fulltextFilter(), emphasis on "hacks", since it's a fragile fix insofar as future additions to fulltext_cursor could continue to have the problem. (CVS 3521) (check-in: [18142fdb6d] user: shess branch: trunk, size: 165500)

01:02
[6065a73ad8] part of check-in [c8151a998e] Delta-encode terms in interior nodes. While experiments have shown that this is of marginal utility when encoding terms resulting from regular English text, it turns out to be very useful when encoding inputs with very large terms. (CVS 3520) (check-in: [c8151a998e] user: shess branch: trunk, size: 164964)
2006-11-18
00:12
[74a5db3f7f] part of check-in [f6e0b080dc] Store minimal terms in interior nodes. Whenever there's a break between leaf nodes, instead of storing the entire leftmost term of the rightmost child, store only that portion of the leftmost term necessary to distinguish it from the rightmost term of the leftmost child. (CVS 3513) (check-in: [f6e0b080dc] user: shess branch: trunk, size: 162408)
2006-11-17
21:12
[57d8cd57ce] part of check-in [f30771d5c7] Refactoring groundwork for coming work on interior nodes. Change LeafWriter to use empty data buffer (instead of empty term) to detect an empty block. Code to validate interior nodes. Moderate revisions to leaf-node and doclist validation. Recast leafWriterStep() in terms of LeafWriterStepMerge(). (CVS 3512) (check-in: [f30771d5c7] user: shess branch: trunk, size: 161513)
2006-11-13
21:09
[7909381760] part of check-in [9b6d413d75] Delta-encode docids. This is good for around 22% reduction in index size with DL_POSITIONS. It improves performance about 5%-6%. (CVS 3511) (check-in: [9b6d413d75] user: shess branch: trunk, size: 160797)
21:00
[667a93b3fe] part of check-in [64b7e34061] Require a minimum fanout for interior nodes. This prevents cases where excessively large terms keep the tree from finding a single root. A downside is that this could result in large interior nodes in the presence of large terms, which may be prone to fragmentation, though if the nodes were smaller that would translate into more levels in the tree, which would also have that problem. (CVS 3510) (check-in: [64b7e34061] user: shess branch: trunk, size: 159745)
20:15
[9b28f218c0] part of check-in [9628a61a6f] Allow backing tables to be missing on dropping fts table. Fixes http://www.sqlite.org/cvstrac/tktview?tn=1992,35 . (CVS 3509) (check-in: [9628a61a6f] user: shess branch: trunk, size: 158927)
2006-10-31
18:13
[afa395abf3] part of check-in [3cd9b64b96] Fix a pair of memory leaks. These were turned up by running valgrind memcheck with various 10k doc insert, update, delete, and query tests. (CVS 3497) (check-in: [3cd9b64b96] user: shess branch: trunk, size: 158897)
2006-10-26
00:41
[10fe8d96a9] part of check-in [cde383eb46] Empty queries should get no results. My recent change ( http://www.sqlite.org/cvstrac/chngview?cn=3486 ) broke test fts2a-5.3. This change should make the expected result more obvious. (CVS 3489) (check-in: [cde383eb46] user: shess branch: trunk, size: 158030)
00:04
[bee8988db6] part of check-in [5878add083] Make memset() uses less error-prone. http://www.sqlite.org/cvstrac/tktview?tn=2036,35 describes some cases where we were passing memset() a length which was the sizeof a pointer, rather than the structure pointed to. Instead, wrap this idiom up in CLEAR() and SCRAMBLE() macros. (CVS 3488) (check-in: [5878add083] user: shess branch: trunk, size: 157895)
2006-10-25
21:00
[f3d0b37ba8] part of check-in [af5bfb986e] Replace the DocList and DocListReader structures. The new structures distinguish reading from a static buffer from writing to a dynamic buffer. This allows n-way doclist merging, and in-place merging of segment leaf nodes, which together cut segment merge times in half. (CVS 3486) (check-in: [af5bfb986e] user: shess branch: trunk, size: 158065)
05:21
[8f5e5fccec] part of check-in [fed79beec7] Don't store empty segments. When inserting empty strings, the code was writing out a segment made up of a single leaf node containing the \0 header. LeafReader assumed that leaf nodes always contained at least one term, so assertions would fail.

While it would be possible to support reading and merging empty segments, there's no reason to do so. While this change could have been done in writeZeroSegment(), I put it in leafWriterFlush() so that it would work right if segmentMerge() created an empty segment, which could happen with future changes to how deleted documents are handled. (CVS 3484) (check-in: [fed79beec7] user: shess branch: trunk, size: 143422)

2006-10-12
23:15
[ddfca6aecb] part of check-in [85272b2f53] Convert fts2 to store data in a way which allows for much faster updates. Groups of documents form segments which are encoded in a btree layered over a table of blocks, with various tricks to make merges fast. This performs 20x-25x faster than fts1 when loading the Enron corpus, and is only slightly slower for queries. (CVS 3474) (check-in: [85272b2f53] user: shess branch: trunk, size: 143308)
2006-10-10
23:22
[ba2b9e96fe] part of check-in [5e8bbb85c1] Fix leaky symbols. With this change, fts1 and fts2 can both be statically linked. (CVS 3472) (check-in: [5e8bbb85c1] user: shess branch: trunk, size: 98613)
17:37
Added: [1052f03493] part of check-in [d0d1e7cdcc] Copy fts1/ to fts2/, changing reference from fts1 to fts2. For future reference, the source versions copied were:

README.txt r1.1 fts1.c r1.37 fts1.h r1.2 fts1_hash.c r1.1 fts1_hash.h r1.1 fts1_porter.c r1.1 fts1_tokenizer.h r1.4 fts1_tokenizer1.c r1.6 (CVS 3471) (check-in: [d0d1e7cdcc] user: shess branch: trunk, size: 98536)