Announce: Tool to compare SQLite versions and configurations

(1) By Dan Shearer (danshearer) on 2021-01-09 19:58:41 [source]

The thanks for SQLite 3.34.0 message was a cheerful way to end 2020, as Jose celebrated increased speed. SQLite often becomes faster with new versions, although as shown in the LumoSQL README it sometimes seems to get slower.

One of the problems with measuring SQLite is consistency, meaning that sometimes things get slower not because of a lack of benchmarking, but a lack of repeatability. For example if a developer upgrades their computer, a new software release may seem faster even when it is not. Even though benchmark results look like they have been repeated, they have not been because the hardware changed.

Even when we are sure the hardware is the same, there is the question of choosing which tests to run, and making sure other people run the same tests. Given that the SQLite source tree has 9 tools named *speed*, many of them either tunable or very specifically pre-tuned, that becomes a hard problem. How can we consistently run these tools, comparing and sharing results? The paper Dangers and Complexity of SQLite3 Benchmarking even sounds a little apocalyptic:

...changing just one parameter in SQLite can change the performance by 11.8X... up to 28X difference in performance

If you find these problems interesting, then you might want to give LumoSQL 0.4 benchmarking a try. We had fun getting to this point, and now we're working on how to amalgamate test runs, present them graphically and so on. Oh and we discovered that Tcl is much better than the general opinion of it seems to be.

Introducing LumoSQL 0.4 Benchmarking

The benchmark tool released with LumoSQL 0.4 is intended to be a way of consistently measuring SQLite. Tests are run in as controlled a way as possible, each test is assigned a SHA3 along with identifying features, and the results are stored in an SQLite database file. The LumoSQL README shows how to get started quickly with this tool.

The following section is from the full documentation for LumoSQL benchmarking, which is much more detailed than the LumoSQL README.md.

Questions The Build and Benchmark System Answers

A single command can now give universal, repeatable, definitive answers to the following seemingly-simple questions:

How can benchmarking runs be shared in a consistent manner between all users? (hint: use a standardised SQLite database)
Does SQLite get faster with each version? (hint: not always)
Which compile options make a given version of SQLite faster?
How do different versions and compile options combine to change performance as data size gets larger?
Does SQLITE_DEBUG really make SQLite run approximately three times slower?
What happens when a given set of compile options, versions and data size are tested on faster and slower disks?
Do I need to run hundreds of combinations to make decisions about SQLite versions/options/hardware? (hint: no, because you now can compare benchmarking results databases)

Having addressed the above questions, the following seemingly more-difficult questions now become very similar to the previous ones:

What happens to performance when LMDB is swapped in as a storage backend for SQLite? (hint: there is a strange performance curve with increasing LMDB versions)
How does the Oracle-funded BDB backend compare with other backends, including the SQLite Btree? (hint: Oracle seems to have thought longer runtimes are better :-)
How do all of the above compare with each other with different build options, versions and datasizes? (hint: now can share benchmarking results, we can take advantage of thousands of CPU-hours from other people)

What Next?

Go and try it (the instructions are intended to be simple.

Think about what tests you'd like to see added. Now the infrastructure is done, it's easy to add any desired tests. We'd love to have input from others on this.

Tell us about any bugs you find. We've mostly tried to use other people's well-tested code, especially SQLite's.

Best,

Dan Shearer