SQLite User Forum

Performance Analysis
Login

Performance Analysis

(1) By anonymous on 2022-03-16 11:28:33 [source]

Hi,

I am trying to use SQLite FTS support for one of our products.

I am curious to know a few things, can this forum help me?

  1. What is the indexing time in general to index ~100MB text files in SQLLite? Is there a detailed performance matrix present somewhere?

  2. What is the query performance time in general for same index 100*100MB files content.?

  3. Does anyone has any comparison done with SQLite FTS and maybe like Lucene?

  4. Can we consider SQLite FTS as an alternative to indexing products considering we need to index only text content?

(2) By Simon Slavin (slavin) on 2022-03-17 06:52:45 in reply to 1 [link] [source]

Unfortunately it's difficult to answer your questions as asked. So I'm just going to type some stuff and if you have further questions please ask.

SQLite is written in C, and a lot of effort has gone into keeping it simple and elegant. As a consequence it processes very quickly. But as with anything handling 100MB (or 100 x 100MB) of data, most of its time is taken by storage access, either reading or writing. Faster storage (fast SSD), or a motherboard and OS with a fatter data-path can reduce time taken by a factor of ten. Having data storage set to sever mode, where ACID rules are observed, can slot it down by a factor of five.

So SQLite's speed is hardware-bound. As such, there is no point in benchmarking SQLite indexing or searching on a 'typical' setup. The speed on any real setup would be completely different to the benchmark.

SQLite does not index text files. If you want fast search of text you have to put it into a SQLite database. Which means that SQLite would end up with one file with all the searchable texts in it. SQLite can theoretically handle a database of 281 terabytes so it has no problem with holding a lot of data in one file, but your setup may not be suited for doing this. Especially considering how it backs up.

If you want to index text files and leave them as separate text files, you may be better off with some other solution. I'll point out that current versions of macOS automatically index any text files for searching, even without you consciously turning anything on. Perhaps current versions of Windows do this too.

(4) By Dan Kennedy (dan) on 2022-03-18 06:51:26 in reply to 2 [link] [source]

SQLite does not index text files. If you want fast search of text you have to put it into a SQLite database. Which means that SQLite would end up with one file with all the searchable texts in it.

You can also create a contentless table:

https://sqlite.org/fts5.html#contentless_tables

Dan.

(5) By anonymous on 2022-03-21 13:30:03 in reply to 2 [link] [source]

Thanks for all information. I will start on writing a perf test case.

If I understood correctly, SQLite is not designed as a "Search Engine" and cannot be compared with the Lucene or similar search engine. Is there any documentation/diagram which has SQLite FTS internal architecture?

I have seen a few threads where the industry is comparing the SQLite FTS with search products?

As it's one single file, isn't this performance overhead as too many writes/reads cannot happen at the same time?

(3.1) By Dan Shearer (danshearer) on 2022-03-17 11:11:45 edited from 3.0 in reply to 1 [link] [source]

anonymous on 2022-03-16 11:28:33:

What is the indexing time in general to index ~100MB text files in SQLLite? Is there a detailed performance matrix present somewhere?

You might want to try running the SQLite benchmarking suite.

Background Info

It is a generalised benchmarking system for SQLite or anything compatible with SQLite. We have some preliminary results from the first 10,000 benchmark runs and are now confident we have a reliable system that anyone can use to fairly exchange and compare results. We have got some experimental R graphs and yesterday a statistician started helping us with experimental design and our data model. All of this works with arbitrary versions and commit ids of SQLite from the master repository, building any one or all versions and storing the results in an sqlite file, with an SHA3 run ID.

We are probably a few weeks away from publishing really solid results, but the tools are tested and you can use them to get your own results today.

Answering your specific FTS performance question

We don't have a FTS test yet, but you can add your own with a few lines of Tcl modeled on this basic insert test.

This would allow you to:

  • Compare different SQLite FTS configurations. It appears from the docs that there are at least six table setup parameters. Some of them might be amazingly faster than others.
  • Compare FTS against non-FTS for similar tasks, for all of the above FTS setup configurations. This is where the multiway capabilities of the benchmarking tool really shine, because it is so annoying to do that otherwise.
  • To compare different use cases of FTS, eg perhaps you are interested in certain UTF-8 charsets, or strings longer than 5000 characters or something.
  • To see how FTS varies across different SQLite releases. Don't assume that more recent SQLite releases are necessarily faster, although they seem to trend faster over time. But this will quickly tell you if the latest release has an FTS regression.

And please, we would love to see any FTS tests you write so we can include them in our benchmarking suite. Which would then mean you get to see the results of other people running your tests. Right now we have a donated fast cluster doing benchmark runs that take three days each, for example.

Why develop SQLite benchmarking?

LumoSQL exists to learn about SQLite performance and to experiment with adding features non-destructively, so we first needed to have a baseline measure. And to our surprise, there wasn't one. There are multiple papers on the topic, including one with the typical gloomy title The Dangers and Complexities of SQLite Benchmarking.

We feel we are getting close to having other people solve this problem for us, by running lots of benchmarks and submitting them to a common pool. We shall see.

Does that help?

Dan Shearer