Dense vector search in SQLite?
(1) By Simon Willison (simonw) on 2022-09-05 18:12:46 [link] [source]
I just saw this new feature that's now available in Solr:
https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
"Solr’s Dense Vector Search adds support for indexing and searching dense numerical vectors."
Made me think: are there existing tricks for doing something like this using SQLite? Is it something that an extension could implement?
It feels like SQLite could offer a really good platform to implement this kind of feature. I'm interested in hearing if there's existing work in this space already - or if it's the kind of thing that SQLite itself might consider implementing as an extension, similar to the existing R*Tree module.
(2) By Gunter Hick (gunter_hick) on 2022-09-06 06:24:40 in reply to 1 [link] [source]
SQLite's virtual table mechanism already allows the implementation of custom indexing methods. I have personally used it to implement bitmap indexing. As long as it is possible to estimate the effort required for a specific set of constraints - a constraint being a tuple (field, relational-op) - reasonably well, the query planner can find a fast query by itself. Even if you can only determine which constraints can be handled by your indexing method, you can use CROSS JOIN to force a hand-picked plan.
(3) By Simon Willison (simonw) on 2022-09-06 14:33:18 in reply to 2 [link] [source]
"Even if you can only determine which constraints can be handled by your indexing method, you can use CROSS JOIN to force a hand-picked plan."
That sounds VERY useful - could you expand that a little with an example? I can't quite picture what that would look like.
(4) By Gunter Hick (gunter_hick) on 2022-09-06 15:23:44 in reply to 3 [source]
An SQLite virtual table is implemented by a set of functions (methods) described here https://sqlite.org/vtab.html The CROSS JOIN is explained in section 7.3 of https://sqlite.org/optoverview.html Lets assume you have a table of text documents: CREATE TABLE docs (Id INTEGER PRIMARY KEY, content TEXT); You then need a virtual table module to generate a vector from a text. CREATE VIRTUAL TABLE text2vec4 USING vector(4); which pretends to be a table declared as (v1 float, v2 float, v3 float, v4 float, content text hidden). You can then SELECT * from text2vec4 where content = 'sample text'; to return a vector representation of the content. Syntactic sugar allows you to SELECT text2vec4('sample text'); instead Then another virtual table to store and retrieve vectors. CREATE VIRTUAL TABLE docs_index USING dense(4); which pretends to be a table declared as (v1 float hidden, v2 float hidden, v3 float hidden, v4 float hidden, docid integer); You can then load your data using INSERT INTO docs_index(docid,v1,v2,v3,v4) SELECT Id,text2vec4(content) FROM docs; And you can retrieve documents matching a search phrase using SELECT content FROM docs_index CROSS JOIN docs ON docs_index.docid = docs.Id WHERE (v1,v2,v3,v4) = text2vec4('search phrase'); or shorter SELECT content FROM docs WHERE Id = docs_index(text2vec4('search phrase'));
(5) By Simon Willison (simonw) on 2022-09-07 00:14:14 in reply to 4 [link] [source]
Thanks very much, I completely understand what you are proposing now. I'll keep this in mind for future projects!
(6) By Simon Willison (simonw) on 2022-09-07 00:49:54 in reply to 5 [link] [source]
I found this explanation of CROSS JOIN as an optimization trick really useful: https://sqlite.org/queryplanner-ng.html#_fixing_the_problem
(7) By Bas Buller (BasBuller) on 2022-12-18 09:44:54 in reply to 5 [link] [source]
Out of curiosity, any chance you have started work on this idea?
(8) By anonymous on 2023-02-13 06:52:46 in reply to 1 [link] [source]
Just came across this in a google search, unsure if they solve your specific needs, but wanted to include them here in case they are useful:
- https://observablehq.com/@asg017/introducing-sqlite-vss
- > sqlite-vss is a new SQLite extension that adds vector search capabilties to SQLite, based on Faiss. It's several steps above storing embeddings as JSON or pickle, but doesn't scale as well as heavier vector databases like Pinecone/Qdrant/Weaviate. Great for local-first applications and no-nonsense deployments!
- https://github.com/asg017/sqlite-vss
- "I needed SQLITE but for vectors so I wrote it myself. Now it's on PyPI - introducing VDBLITE": https://www.youtube.com/watch?v=gHrIrWyUq8U