Dense vector search in SQLite?
(1) By Simon Willison (simonw) on 2022-09-05 18:12:46 [link] [source]
I just saw this new feature that's now available in Solr:
"Solr’s Dense Vector Search adds support for indexing and searching dense numerical vectors."
Made me think: are there existing tricks for doing something like this using SQLite? Is it something that an extension could implement?
It feels like SQLite could offer a really good platform to implement this kind of feature. I'm interested in hearing if there's existing work in this space already - or if it's the kind of thing that SQLite itself might consider implementing as an extension, similar to the existing R*Tree module.
(2) By Gunter Hick (gunter_hick) on 2022-09-06 06:24:40 in reply to 1 [link] [source]
SQLite's virtual table mechanism already allows the implementation of custom indexing methods. I have personally used it to implement bitmap indexing. As long as it is possible to estimate the effort required for a specific set of constraints - a constraint being a tuple (field, relational-op) - reasonably well, the query planner can find a fast query by itself. Even if you can only determine which constraints can be handled by your indexing method, you can use CROSS JOIN to force a hand-picked plan.
(3) By Simon Willison (simonw) on 2022-09-06 14:33:18 in reply to 2 [link] [source]
"Even if you can only determine which constraints can be handled by your indexing method, you can use CROSS JOIN to force a hand-picked plan."
That sounds VERY useful - could you expand that a little with an example? I can't quite picture what that would look like.
(4) By Gunter Hick (gunter_hick) on 2022-09-06 15:23:44 in reply to 3 [source]
An SQLite virtual table is implemented by a set of functions (methods) described here https://sqlite.org/vtab.html The CROSS JOIN is explained in section 7.3 of https://sqlite.org/optoverview.html Lets assume you have a table of text documents: CREATE TABLE docs (Id INTEGER PRIMARY KEY, content TEXT); You then need a virtual table module to generate a vector from a text. CREATE VIRTUAL TABLE text2vec4 USING vector(4); which pretends to be a table declared as (v1 float, v2 float, v3 float, v4 float, content text hidden). You can then SELECT * from text2vec4 where content = 'sample text'; to return a vector representation of the content. Syntactic sugar allows you to SELECT text2vec4('sample text'); instead Then another virtual table to store and retrieve vectors. CREATE VIRTUAL TABLE docs_index USING dense(4); which pretends to be a table declared as (v1 float hidden, v2 float hidden, v3 float hidden, v4 float hidden, docid integer); You can then load your data using INSERT INTO docs_index(docid,v1,v2,v3,v4) SELECT Id,text2vec4(content) FROM docs; And you can retrieve documents matching a search phrase using SELECT content FROM docs_index CROSS JOIN docs ON docs_index.docid = docs.Id WHERE (v1,v2,v3,v4) = text2vec4('search phrase'); or shorter SELECT content FROM docs WHERE Id = docs_index(text2vec4('search phrase'));
(5) By Simon Willison (simonw) on 2022-09-07 00:14:14 in reply to 4 [link] [source]
Thanks very much, I completely understand what you are proposing now. I'll keep this in mind for future projects!
(6) By Simon Willison (simonw) on 2022-09-07 00:49:54 in reply to 5 [link] [source]
I found this explanation of CROSS JOIN as an optimization trick really useful: https://sqlite.org/queryplanner-ng.html#_fixing_the_problem