SQLite Forum


28 forum posts by user danshearer

16:12 Reply: Compiling sqlite3 with readline support (artifact: 22381e80f8 user: danshearer)

Update: I tried again with a pristine install of NetBSD 9.1.

configure does not get dependencies wrong, that was another problem.

So the recipe is now simply:

  1. Pristine default install of NetBSD with pkg_src and pkgin
  2. pkgin install tcl
  3. ln -s /usr/include/readline editline
  4. ./configure ; make

This gives a correct binary (with a tlcstubs complaint on this platform, but that's a problem for another post. Some Linux distro maintainers asked me about that too.)

The editline options in configure need to be reviewed carefully to get around the readline/editline switcharoo, which as I said I think applies to several other platforms.

Dan Shearer

13:05 Reply: Compiling sqlite3 with readline support (artifact: 0e9668c612 user: danshearer)

Here is a report for building SQLite 3.35.0 on NetBSD. This may be relevant to MacOS and FreeBSD.

On NetBSD 9.1 ./configure incorrectly fails to detect some dependencies are missing, and then gets readline wrong in a similar way to the report in this thread for MacOS. Unlike the original poster I was just trying to compile SQLite at all, not to use a specific edit/readline, but I think it is nearly the same problem.

./configure does correctly find libedit:

checking for library containing readline... -ledit"

but when compiling, sqlite sources want to include <editline/readline.h> , which Uilbeheist tells me has never existed on NetBSD.

I suspect --enable-editline may be ineffective on all BSD-type platforms.

The simplest way I know to get SQLite sources to build on NetBSD is:

  1. pkgin sqlite3 # use official package to get deps
  2. ln -s /usr/include/readline editline # fake to fool sqlite

I have not yet built Fossil on NetBSD 9.1 (fossil contains SQLite 3.35.0 sources), but I would expect the autosetup Fossil uses to do a better job than the GNU autotools SQLite uses.

Dan Shearer

14:03 Reply: Thousands of SQLite Databases for each user (artifact: 33cd50514c user: danshearer)

On 2021-03-24 17:47:36 anonymous said:

SQLite is not designed for large multi-user distributed applications. But you have described a use case that might match SQLite's sweet spot.

The word "scaling" is often used lightly, like the word "terrabyte". The magical runeword "horizontal scaling" does not by itself make giant workloads function well. That's why you need to look very hard at what is really meant here, and if you in fact are ever likely to have a giant workload. Not many do.

I see Internet threads online advising against this because it makes it harder to horizontally scale.

The scaling for any one mostly read-only SQLite database is firstly vertical, and horizontal doesn't tend to arise for a long time if ever. Consider the number of concurrent reads you can get from an SQLite node before you even need to feed it more CPU and RAM, and then up until the maximum amount of RAM and CPU it is possible to give that node these days. It's such an enormous number of hits that if you still need to look at horizontal scaling then you probably have lots of ways of paying for it. Horizontal scaling certainly is possible, but by comparison I have never heard of a way of having a single Postgres node coming within one zero of the hitrate of SQLite, because Postgres does different things. From the limited information you provide, the use case is a good match for SQLite.

As a subset of this, it is conceivable that your problem with SQLite becomes that the relatively few writers briefly but noticeably block some of the vast number of readers even with WAL2. That is an area I'm looking at where SQLite is more simplistic than is probably necessary, but even with SQLite as-is there are things you can do to address this.

The requirements and design you have sketched out in the original post plus followups suggest that, even if load is not evenly distributed across each SQLite database, you can still serve billions of SQL queries in total per day (I get that guess by multiplying "thousands of databases" by a conservatively low number of requests per database.) There are assumptions in here such as no merging of data between nodes and no pathological queries etc.

I don't see why it would be hard to scale

From the information you gave, including that it works with plain files today, probably not. As food for thought, the limiting case of plain files and nearly 100% readers is email, where Maildir and similar systems will scale to very large numbers because there are so few moving parts (or, a short codepath.) I'm not saying that you are better off with plain files, but installing a networked SQL database may decrease your scaling by introducing so many more moving parts. SQLite has a short codepath, which is part of why it can serve so many requests per second.

as it is just singular files that can be easily replicated.

I suspect that may be an incorrect scaling dimension to be worried about from the information you have provided. "Copying single files" suggests that a given SQLite node is having too many reads to cope, which as I have said is a very high number. It also suggests you will need a loadbalancer/director to switch across nodes, which you may not need in the "thousands of SQLite databases" model although you haven't given much to work with. So I suggest that until you have nodes getting to that very high usage, you shouldn't need to be copying any files.

Dan Shearer

15:56 Reply: SQLite Version 3.35.0 - Beta 1 (artifact: 32ff2a003a user: danshearer)

With what I know now the comments in btree.h should probably say "MVCC-like" or "MVCC-style". I'm not sure that BTree was intended to be strictly MVCC, just to be a pragmatic solution that can deliver isolation, right?


15:53 Reply: SQLite Version 3.35.0 - Beta 1 (artifact: 82d9117ede user: danshearer)

Richard Hipp (drh) wrote on 2021-03-01 16:54:

... minor code formatting, or documentation ...

Can I merge this into trunk: https://sqlite.org/src/file?name=src/btree.h&ci=btree-code-documentation ?

(there has been just one change since then, the addition of "#define BTREE_PREFORMAT").

Dan Shearer

20:25 Reply: Tcl limit API (artifact: ceb87fce13 user: danshearer)

Hopefully the books I've ordered will arrive this week and I can spend the weekend in a deep dip.

I'm getting into Tcl too. Quite a few Tcl books are outdated, but I can recommend "The Tcl Programming Language" both Ashok P Nadkarani as being current up to 8.6, and a considerable amount of writing on the author's website about 8.7-related topics.

Dan Shearer

19:58 Post: Announce: Tool to compare SQLite versions and configurations (artifact: 9132d87c18 user: danshearer)

The thanks for SQLite 3.34.0 message was a cheerful way to end 2020, as Jose celebrated increased speed. SQLite often becomes faster with new versions, although as shown in the LumoSQL README it sometimes seems to get slower.

One of the problems with measuring SQLite is consistency, meaning that sometimes things get slower not because of a lack of benchmarking, but a lack of repeatability. For example if a developer upgrades their computer, a new software release may seem faster even when it is not. Even though benchmark results look like they have been repeated, they have not been because the hardware changed.

Even when we are sure the hardware is the same, there is the question of choosing which tests to run, and making sure other people run the same tests. Given that the SQLite source tree has 9 tools named *speed*, many of them either tunable or very specifically pre-tuned, that becomes a hard problem. How can we consistently run these tools, comparing and sharing results? The paper Dangers and Complexity of SQLite3 Benchmarking even sounds a little apocalyptic:

...changing just one parameter in SQLite can change the performance by 11.8X... up to 28X difference in performance

If you find these problems interesting, then you might want to give LumoSQL 0.4 benchmarking a try. We had fun getting to this point, and now we're working on how to amalgamate test runs, present them graphically and so on. Oh and we discovered that Tcl is much better than the general opinion of it seems to be.

Introducing LumoSQL 0.4 Benchmarking

The benchmark tool released with LumoSQL 0.4 is intended to be a way of consistently measuring SQLite. Tests are run in as controlled a way as possible, each test is assigned a SHA3 along with identifying features, and the results are stored in an SQLite database file. The LumoSQL README shows how to get started quickly with this tool.

The following section is from the full documentation for LumoSQL benchmarking, which is much more detailed than the LumoSQL README.md.

Questions The Build and Benchmark System Answers

A single command can now give universal, repeatable, definitive answers to the following seemingly-simple questions:

  • How can benchmarking runs be shared in a consistent manner between all users? (hint: use a standardised SQLite database)
  • Does SQLite get faster with each version? (hint: not always)
  • Which compile options make a given version of SQLite faster?
  • How do different versions and compile options combine to change performance as data size gets larger?
  • Does SQLITE_DEBUG really make SQLite run approximately three times slower?
  • What happens when a given set of compile options, versions and data size are tested on faster and slower disks?
  • Do I need to run hundreds of combinations to make decisions about SQLite versions/options/hardware? (hint: no, because you now can compare benchmarking results databases)

Having addressed the above questions, the following seemingly more-difficult questions now become very similar to the previous ones:

  • What happens to performance when LMDB is swapped in as a storage backend for SQLite? (hint: there is a strange performance curve with increasing LMDB versions)
  • How does the Oracle-funded BDB backend compare with other backends, including the SQLite Btree? (hint: Oracle seems to have thought longer runtimes are better :-)
  • How do all of the above compare with each other with different build options, versions and datasizes? (hint: now can share benchmarking results, we can take advantage of thousands of CPU-hours from other people)

What Next?

Go and try it (the instructions are intended to be simple.

Think about what tests you'd like to see added. Now the infrastructure is done, it's easy to add any desired tests. We'd love to have input from others on this.

Tell us about any bugs you find. We've mostly tried to use other people's well-tested code, especially SQLite's.


Dan Shearer

17:03 Reply: Native support for pluggable backends in SQLite (artifact: 944240610c user: danshearer)

Dan Shearer (danshearer) wrote on 2020-11-19 10:35:45:

(coming to not-forking soon is better dependency checking for things like Git and the Git Perl modules at runtime not build/install time. But that isn't going to fix your problem, only perhaps detect it better. Although Fossil support is also coming, and that might fix all your Git-related problems :-)

All done and committed to not-forking.

It's worth installing an updated not-forking and doing lumosql "make benchmark".

Dan Shearer

10:35 Reply: Native support for pluggable backends in SQLite (artifact: 07a0f62446 user: danshearer)

phpstatic wrote on 2020-11-19 02:52:45:

I already run this:

perl -MCPAN -e shell install Git::Repository

This suggests that any program that says "Use Git;" at the top will fail, so there appears to be something wrong with your Perl setup.

Did you perhaps choose to use CPAN directly, where your operating system has its own Perl package management? For example...

  • On Debian/Ubuntu: apt install libgit-wrapper-perl
  • On Centos/Red Hat: yum install perl-Git

(coming to not-forking soon is better dependency checking for things like Git and the Git Perl modules at runtime not build/install time. But that isn't going to fix your problem, only perhaps detect it better. Although Fossil support is also coming, and that might fix all your Git-related problems :-)

Dan Shearer

17:58 Reply: Native support for pluggable backends in SQLite (artifact: 30c9202a0a user: danshearer)

phpstatic on 2020-11-18 15:33:26 wrote:

I check your benchmark report from this link: https://lumosql.org/benchmarking/

Thanks, it's really helpful to have people look at what we've done so far.

The concept is that you can run the benchmarking tool for yourself. That is the tool that produced these example results on old hardware, so you will get much better numbers.

The reason for this design is that typically benchmarking is about one person running many tests and publishing their results, or, many people posting to a forum about the results they have. Results are very hard to replicate, and almost impossible to compare. Soon you will also be able to use this tool to upload your results database to create a public database of reproducible measurements for comparison.

In this case, while you have much better hardware and get much bigger numbers, the comparisons should be equally valid.

Please tell me if I have not been clear in this explanation so far :-)

The number seems not good(100TPS for sqlite, 300 TPS for sqlite+LMDB).

I recommend you look at the benchmark filter tool which illustrates how you can:

  1. Get your own numbers for 3.33.0 (or some other version you specify). I'm sure your numbers will be much higher than these example numbers.
  2. Compare on your exact machine a 3.33.0 vs 3.31.0. Or 3.31.0 vs 3.8.1+LMDB . Etc.

After that, when your benchmarking database is uploaded and combined with the example benchmarking database you saw at lumosql.org, then we can answer some more questions:

  1. Does SQLite version X vs SQLite version Y behave proportionately the same on your modern hardware to my old hardware? Perhaps it doesn't, because of memory pressures, or disk I/O bottlenecks. That's very useful information.
  2. Same question as (3), except for different backends. We do not necessarily expect uniform linear improvements when moving from old hardware to new hardware.

Is the test enable MMAP + WAL ?

That is an excellent point, because it is about the dimensions of benchmarking. Now that we have a working benchmarking tool that saves information to an SQLite database, we are improving the data it collects. The dimensions we have so far include:

  • SQLite version
  • Build options
  • Backend, and backend version
  • Environment major characteristics, eg Linux/Windows and their versions
  • Environment details - whatever we can get, in a portable way. Hardware details, OS detailed version numbers, disk media info.

While we are making the schema as future-proof as we can, we are also trying to make it comprehensive now. We are very aware that we haven't done any benchmarking on Windows, and never even run LumoSQL on Android once... but these are both important deployment targets. So the benchmarking has to try very hard to run on these systems and give results that can be meaningfully compared.

Here is what is still to come in benchmarking, very soon:

  • Capturing system hardware
  • Results upload system
  • Verification of results, by testing that benchmark results rows are internally consistent

This is not the case any more, use a VFS shim can handle the encryption and corruption detection. Use VFS no need call internal functions.

Unfortunately that is not quite the case. In the LumoSQL Corruption Design document the heading "Design of the SQLite Checksum VFS Loadable Extension" lists positives and negatives. The negatives listed include:

  • No information about the logical location of this error, eg what row(s) it affects. The application knows nothing about how rows map to pages. All the application knows is that SQLITE_IOERR_DATA was returned during a read operation. That's a lot better than silent corruption, but also not as helpful as it could be.
  • Brittle implementation due to requirements of the file format. The "bytes of reserved space on each page" value at offset 20 in the SQLite database header must be exactly 8. What if we want a better or different checksum?
  • No facility for isolation or recovery of data. If we know with certainty that only row number 1234 is corrupted, then the application can take some action that is a lot less drastic than suspecting the entire file. This comes down to the design goal: the checksum VFS is aimed at the "random storage bitflips" problem. This is a very real issue, but there are many other causes of corruption in a database, including application bugs, partial restore of a backup file, an SQLite crash, and more.
16:48 Reply: Native support for pluggable backends in SQLite (artifact: e80a4afe1b user: danshearer)

Dan Shearer (danshearer) on 2020-11-13 15:47:39:

I received a comment on the section I wrote:

... "What's Missing... Nobody Really Knows :-)"

In terms of performance, reliability and tradeoffs, nobody knows what is missing. That is because there is no consistent way of comparing one version of SQLite with a modified version, and certainly not for repeating these comparisons across many different users, systems and build environments. That's why we built benchmarking, which also works for vanilla SQLite-on-SQLite.

It would have been better to say "The LumoSQL project hasn't been able to find published benchmarking", because there may be unpublished benchmarking, and there are experts on this forum who have an excellent understanding on how to make SQLite run fast.

The LumoSQL Knowledgebase uses better words on this point under the heading "List of Relevant Benchmarking and Test Knowledge". The paper Dangers and complexity of sqlite3 benchmarking listed says:

...changing just one parameter in SQLite can change the performance by 11.8X... up to 28X difference in performance

That doesn't mean the benchmarking itself is so hard, but I'm not aware of anyone publishing like-for-like comparisons.

Dan Shearer

15:47 Reply: Native support for pluggable backends in SQLite (artifact: 69feb1a3af user: danshearer)

Ben, Gunter, others,

Ben's proposal is the sort of philosophy we're implementing in LumoSQL, although we are approaching it from the opposite direction. In LumoSQL we are trying to identify what is missing in SQLite, and then implement minimally-intrusive ways of addressing those problems. As of LumoSQL v0.3 we have working, benchmarked prototypes and are starting to close in on what a backend API might look like.

Gunter raised a point about the Virtual Table Interface which I respond to near the bottom.

I will leave a direct answer to Ben about header files and API details to another post. Because I feel we need to know why we are making changes, and then how we're going to do it, and finally what should be done. I have already discussed the general source architecture with Ben (and of course he was working in this area years ago) so I'll be glad to move the code architecture chat out in public.

We have done a great deal of work on the "what's missing?" problem, including extensively documenting existing knowledge. I recommend anyone interested in storage APIs for SQLite to look there for inspiration in code and papers related to SQLite, K-V stores, VFS, benchmarking, transaction theory and more. A lot of SQLite work has taken place in isolated hard forks, and we are trying to make it easier for these forks to feel less isolated.

None of this overlooks the value of the existing binary on-disk format of SQLite as relied on by billions of people daily. That's another part of the discussion. This discussion is about what actions will address the pressure for SQLite storage to do more and different things.

What Is Missing in SQLite Storage?

Nobody Really Knows :-)

In terms of performance, reliability and tradeoffs, nobody knows what is missing. That is because there is no consistent way of comparing one version of SQLite with a modified version, and certainly not for repeating these comparisons across many different users, systems and build environments. That's why we built benchmarking, which also works for vanilla SQLite-on-SQLite.

Ability to Try New or Different K-V Options

There is a small number of K-V stores that provide the facilities SQLite needs: MVCC, compatible transaction model, a Berkely Database-style API, and implemented in C. The main candidates are: LMDB, BDB (from when Oracle bought Sleepycat) and BDB (forked by Bloomberg when Oracle bought Sleepycat, and adding things like page-level locking.) Karl Malbrain's database is a candidate that implements the Serial Safety Net concurrency protocol, which rethinks the way concurrency is handled in a K-V store. This is all about looking at alternative futures for SQLite.

A Modern Alternative to 1990s-era WAL files

Every mainstream SQL database including SQLite uses 1990s-era batch design to achieve reliabile persistent page storage. This works, but it also ignores the last two decades of development in virtual memory management. Modern operating systems know how to save and retrieve pages from disk with very low corruption risk and little-to-no recovery requirements. SQLite has an option to try this out, because LMDB is ubiquitous in open source codebases and we know a lot about how it behaves. SQLite is the only mainstream database that has any chance of exploring 21st century page management. SQLite+LMDB has no WAL files, and greatly reduced pager code.

Better Corruption Detection

SQLite's main techniques for avoiding corruption seems to be to take extreme care in the WAL implementation, and to benefit from efficient code that limits the time window for errors - plus not needing to worry about consistency when doing network replication. The major databases (Pg, MS, Oracle, MySQL, MariaDB) all take the approach of having in-and-out of band corruption detection mechanisms. Being an embedded database, SQLite is often used in quite unforgiving hard-crash scenarios, and the question "who has regular difficulty with corrupted SQLite databases?" always results in a forest of raised hands at talks and conferences.

What's missing therefore is: * pre-emptive corruption detection (see next point) * option to use the operating system for this (see previous point)

Unlike the other main databases, SQLite has no pre-emptive corruption detection and only fairly basic on-demand detection.

Checksummed Corruption Detection

For reasons that are unclear, not one mainstream database has the facility to do fine-grained corruption detection that is directly visible to the user. Richard Hipp recently wrote the SQLite Checksum VFS which shows some of the possibilities, with many limitations.

LumoSQL has investigated this extensively and designed a robust system for corruption detection and magic . This needs to be part of the backend API abstraction so that it applies to all K-V stores.

Privacy Support is Missing

Encryption is not electronic privacy, but it is a pre-requistite for it. Since assorted new privacy laws mandate reliable encryption, and since non-crypto-capable SQLite is deployed pervasively, something needs to change. Exactly what or how is unclear because there are many existing options. LumoSQL has documented most of the crypto work that has been done relevant to SQLite, and looked at the new requirements that either exist or are coming soon, and is developing a design that works with multiple backends.


This isn't further discussed here, however several of the SQLite hard forks have added networking backends. LumoSQL is trying to design the backend storage API so that it is not incompatible with networking, which is not the same thing as supporting it.

Why Not Use an Existing SQLite API?

An alternative K-V stores can be implemented as a Virtual Table. That is what Gigimushroom's Database Backend Engine did in 2019. But that won't give access to the internal features of the VDBE-and-below layer that is the secret to performant storage in SQLite, which in turn is why so many SQLite forks have choses to access these internals directly.

It is interesting to read SQLCipher's comments on not using existing APIs for their encryption product. A storage engine isn't the same as an encryption layer, but there are design issues in common. SQLCipher says:

SQLCipher is an extension to SQLite, but it does not function as a loadable plugin for many reasons. Instead, SQLCipher modifies SQLite itself, and is maintained as a separate version of the source tree. SQLCipher releases are baselined against a specific source version of SQLite. However, the project minimizes alterations to core SQLite code to reduce the risk of breaking changes during upstream SQLite merges.

The reasons that SQLCipher is packaged this way, as opposed to a "plugin" or extension to the SQLite amalgamation, follow:

  • Enabling an SQLite codec requires the compile-time definition of SQLITE_HAS_CODEC, which is not present on standard, unmodified SQLite builds.
  • Even when enabled, SQLite isn't setup to load codecs as plugins. While SQLite does have a plugin function for loadable extensions, it does not extend access to any system internals (it mainly used to allow custom user functions).
  • SQLCipher makes calls to internal functions that are not part of the public SQLite API. Sometimes these APIs change, even in between minor SQLite versions. Thus, each update adn merge requires inspection, testing and verification. Making SQLCipher portable across multiple versions of SQLite would not be feasible, nor could it to use only the public API (for instance, even the first critical step of attaching the codec callback to the pager uses an internal API).
  • SQLCipher modifies supporting functions to introduce special pragmas, built in functions, etc (e.g. "PRAGMA cipher_*"). Injecting this functionality in a plugin architecture wouldn't be possible.
  • SQLCipher's test harness relies on support in testfixture to take advantage of the test API and various internal checks (memory reference counting, etc.)
  • Even if it were possible to use a loadable plugin, dynamic libraries aren't available on all supported platforms, for example iOS

For these reasons, and more, LumoSQL is implementing a backend storage API abstraction.

15:50 Post: Benchmarking SQLite versions against each other (artifact: e542545a47 user: danshearer)


At LumoSQL we have had interest in using our tools for benchmarking different versions of vanilla SQLite, the reason being our benchmark tool stores its results in a database and is designed for results to be easily aggregated across many unrelated systems and people. This is nothing to do with the well-known SQLite test suites; testing is not benchmarking. LumoSQL knows about VCSs and caches all source downloads, so its fairly efficient.

Doing A Benchmark Run

A benchmarking run looks like this:

     [install LumoSQL from https://lumosql.org/src/lumosql]
$ make benchmark TARGETS="3.14.15 3.33.0"
     [... output... ]
$ tclsh tool/benchmark-filter.tcl                          # list all runs in database 
$ tclsh tool/benchmark-filter.tcl RUN-ID-1 RUN-ID-2        # compare two runs
$ make benchmark TARGETS="3.20.1 3.25.2"                   # runs the benchmark on 2 more versions, adds results to same database
$ tclsh tool/benchmark-filter.tcl -target 3.20.1 -target 3.14.15 # show these 2 side by side
$ tclsh tool/benchmark-filter.tcl MANY MORE OPTIONS        # see tool's documentation

Caveats are mostly covered in the build documentation and tool documentation

Want to help? Good!

You might want to:

  • consider "versions of SQLite" to include different build-time flags. We do not currently collect the build-time options in the database.
  • help add to the system information collected in tool/benchmark.tcl . We intend the database schema to grow plenty.
  • help add to the system identification we collect. There are pros and cons to how much identification and what kind
  • try this on non-Unix. We haven't got there yet, although LumoSQL is intended to be as portable as SQLite and TCL. Things like CPU time in the TclX module are system-specific.
  • make suggestions for how best to add options for hot/cold runs

We are working on a tool to easily import databases from many people, check them, and aggregate into one single database file. We intend to put this tool online for public submissions as well as making it available for people to run on their own networks.

What's Coming Next

  • benchmark database aggregation at lumosql.org
  • checksums for each database run
  • whatever people have to contribute


The LumoSQL Authors

11:22 Reply: Announcing LumoSQL 0.3 (artifact: 9c977335bb user: danshearer)

Dan Kennedy (dan) on 2020-11-11 08:24:18:

I don't think the forum for lumosql is allowing anonymous access.

Should be fixed. And for not-forking, and LumoDoc.

Dan Shearer

15:50 Post: Announcing LumoSQL 0.3 (artifact: 376fde8826 user: danshearer)

The LumoSQL project provides answers to questions not many people are asking about SQLite. SQLite is easily both the most-used database and most-used K-V store in the world, and so these questions are relevant:

  • How much faster is SQLite improving, version on version, and with various switches selected?
  • What would SQLite look like if it used a more modern journaling model instead of WAL files?
  • Would it be nice for BDB, the ancient Berkeley Database, to magically appear as an SQLite storage backend?
  • Does SQLite work faster if it has LMDB as a storage backend?
  • How on earth do you benchmark SQLite anyway, and compare with itself and others?

LumoSQL is SQLite gently modified to have multiple different backends, with benchmarking and more. There is also the separate LumoDoc project, which isn't just documentation for LumoSQL but also the results of our research and testing, including a list of relevant codebases.

The last talk at the SQLite and Tcl Conference later on today will by me, speaking about LumoSQL. So we thought we should make the first public release... and here it is.


Long-time SQLite watchers may recall some prototype code by Howard Chu in 2013 called "sqlightning", which was btree.c modified to call LMDB internals. That code is what inspired LumoSQL. LumoSQL 0.3 has had significant code contributions from first Keith Maxwell and then Uilebheist in assisting the project founder, Dan Shearer. One of our main achievements is to demonstrate how much more code is needed, so patches are welcomed. Do please drop in on the LumoSQL forum or otherwise consider contributing. Many others have helped significantly with LumoSQL 0.3, including NLNet with sponsorship, besides a long list of essential cheerleaders, encouragers and practical supporters.

The Big Honking End Goal

The end goal is to develop a stable backend storage API in SQLite. This depends on many things, including being minimally invasive and maximally pleasing to drh :-) Even if it cannot be committed to SQLite for some good reason, we will be able to carry it in LumoSQL.

But before we can even think of a storage API we need to understand what different backends might be and what they need. Even key-value stores with quite similar APIs such as SQLite native, LMDB and BDB have different understandings of MVCC, key sizes, locking and more. The proposed API would need to abstract all of this. We've been studying the interactions between src/vdbe*, btree* and pager* as some may have noticed on the SQLite forum. There are not very many MVCC K-V stores suitable for linking to an embedded C library, but we want to collect all those that are.

Nope, Not a Fork

LumoSQL has avoided forking SQLite by developing the not-forking tool. This tool could be helpful for anyone trying to stay in synch with multiple upstreams. It knows how to fetch sources, parse version numbers and make non-controversial merges even in cases where a straight patch or fossil merge would fail. It can also replace entire files, and more.

LumoSQL Features as of version 0.3

$ make what
SQLITE_VERSION=3.33.0            # whatever the latest version is
LMDB_VERSIONS=0.9.9 0.9.16 0.9.27
    +bdb-18.1.32                 # the +means it is not yet a clean LumoSQL not-fork config

This builds the versions listed. With Not-forking we can walk up and down the version tree.

make benchmark

Will perform some fairly simple operations on all targets, storing results in a single SQLite 3.33.0 database. This database is intended to persist, and to be amalgamated with results from others. While some basic query and fsck-like tools are provided, LumoSQL hopes that people with skills in statistics and data presentation will work their magic with this interesting new dataset. The design is meant to encourage addition of new parameters and dimensions to the benchmarking.

make benchmark TARGETS=3.7.17+lmdb-0.9.26+datasize-10

Runs the same benchmarks, except that all of the operations have a zero added to them, so 25000 SELECTs becomes 250000. Other size factors can be chosen.

Results So Far

  • The not-forking approach works. Yes we are going to be able to "try before we buy" a storage API
  • Benchmarking works. It's got a long way to go, but even this much is a powerful new way of comparing SQLite versions
  • SQLite has improved considerably in performance in the last five years
  • SQLite+LMDB performs better than SQLite as dataset sizes increase (testing is incomplete though)
  • SQLite+MDB don't perform at all well... in fact, worse than a very much older vanilla SQLite. (Some would hesitate to use SQLite+MDB in production anyway given that MDB is under the AGPL licence, and SQLite is a library and thus anything using it would also be covered by the AGPL.)

There are some less-technical results too, like the fact that there are many developers around the world who have modified SQLite in interesting ways but there has been no effective way for their work to be compared or evaluated. Oh and, we're using Fossil, definitively so.

Where Next for LumoSQL?

  • Walk up the SQLite version tree to tip. We're all waiting to see what SQLite 3.33.0+LMDBv1.0rc1 will be like.
  • Complete our work moving LMDB to only use liblmdb as a fully-external library
  • Do some concurrency benchmarking, possibly addressing a potential concurrency problem with LMDB in the process. Concurrency is not SQLite's strong point, so this will be very interesting
  • Possibly increase the SQLite version supporting BDB. This is a very important use case because the BDB API is both classic and yet also not LMDB, meaning if we get it right then we get it right for other K-V stores
  • Produce lots more benchmarking data with our existing tools. That means lots of CPU time, and we'd love to have volunteers to help populate this
  • First draft backend API taking into account issues relating to keys, transactions, internal SQLite close couplings, etc.
  • Talk to the active SQLite forks ... if you're reading this, we'd love to hear from you :-)

And that is LumoSQL release 0.3. We look forward to seeing those who can make it to the SQLite and Tcl Conference later on today at https://sqlite.org/forum/forumpost/521ebc1239 , and to producing more releases in quick succession.

11:55 Post: Minor: SQLITE_IOERR_DATA missing from result codes (artifact: 389f1f529b user: danshearer)

src/file/ext/misc/cksumvfs.c uses SQLITE_IOERR_DATA

SQLITE_IOERR_DATA is defined in src/sqlite.h.in

It does not appear in https://www.sqlite.org/rescode.html

I suggest perhaps because its value is at the top of the 32-bit range and something in docsrc gets it wrong during generation.

Dan Shearer

18:32 Reply: Proposal and code: Superset of SQLite Debugging (artifact: 98107670e6 user: danshearer)
/* This demonstration code is part of the SQLite forum thread
 *  "Proposal and code: Superset of SQLite Debugging"
 * and contains the code for new compile-time debugging facilities, and 
 * some no-op illustrations.
 * Compile sqlitedebug (this file) with $CC parameters as follows, then
 * run it:
 * gcc -Wall -O2 sqlitedebug.c -o sqlitedebug -D [...]
 * where example -D options are:
 * There are many other combinations, but this gives the idea of how levels
 * and categories of debugging works, and that SQLITE_DEBUG works unchanged.
 * Dan Shearer
 * dan@shearer.org
 * 2020-09-22

#include <stdlib.h>
#include <stdio.h>

/* Levels of Debugging Info, currently 3 levels 1-3
 * Level 3 is the default, and applies if DEBUG_LEVEL is undefined
 * Level 1      status labels, such as "Flag status in Opcode XX is %s". Also
 *              useful for ad-hoc debugging 
 * Level 2      Level 1 plus flow labels, such as notification of 
                "Entering/Leaving Function X"
 * Level 3      Level 2 plus data structure dumps, and anything else

	#define SELECTIVE_DEBUG(l) DEBUG_ALL || (l)

	#if !defined(DEBUG_LEVEL) /* default to 3 */
	#define DEBUG_LEVEL 3

	#if (DEBUG_LEVEL < 1) || (DEBUG_LEVEL > 3)
	#error "DEBUG_LEVEL must be between 1 and 3. Default is 3"
	#define SELECTIVE_DEBUG(l) 0

	#if defined(DEBUG_ALL)
	#error "DEBUG_ALL specified without SQLITE_SELECTIVE_DEBUG"

	#if defined(DEBUG_LEVEL)
	#error "DEBUG_LEVEL specified without SQLITE_SELECTIVE_DEBUG"

#endif /* SELECTIVE_DEBUG */


	#define DEBUG_PRINT(level,x) \
		if (level <= DEBUG_LEVEL) { \
	       		printf("Level: %d file:%s line:%d %s\n",level,__FILE__,__LINE__,x); \
	#define DEBUG_PRINT(level,x) do {} while (0)

int main() {

printf("Normal SQLITE_DEBUG code goes here, unaffected by anything SQLITE_SELECTIVE_DEBUG does\n");
DEBUG_PRINT(3,"This won't be printed unless the SQLITE_SELECTIVE_DEBUG system is in use");

DEBUG_PRINT(1,"Some random debug observation in the code, does not need to be inside SELECTIVE_DEBUG()");
DEBUG_PRINT(2,"Some random level 2 debug observation in the code, also not inside SELECTIVE_DEBUG()");



18:30 Post: Proposal and code: Superset of SQLite Debugging (artifact: 42e647999d user: danshearer)


This posting and the C code following in the same thread represent an approach I am volunteering to implement in SQLite, if it is agreed to be a good approach.

Adding and using debugging code in SQLite [has some strange problems] which can be easily addressed with the changes proposed in that post, which I am also volunteering to make. But there is a much bigger scope to the problem, which I think can also be safely addressed as follows.

Goal of This Proposal

To add a minimal superset of compile-time debug options to SQLite in sqlInt.h, with the intention of:

  • not disturbing any existing debug code anywhere

  • allowing lightweight debugging such as printing only progress statements, without the execution-time overhead of the current DEBUG_SQLITE

  • allowing only certain classes of debug options to be compiled, which addresses the previous point but also allows for selecting just some of the heavyweight debugging options

  • allowing existing debug code to be safely and simply upgraded to the new discriminated debugging system, with no unexpected changes in functionality to users of existing debug facilities

Note: Debugging is not logging. This proposal only discusses debugging, which is enabled by developers at compiletime. SQLite has a logging system which is available at runtime and controlled by SQLite users and applications.


The compilation documentation says:

The SQLite source code contains literally thousands of assert() statements used to verify internal assumptions and subroutine preconditions and postconditions. These assert() statements are normally turned off (they generate no code) since turning them on makes SQLite run approximately three times slower. But for testing and analysis, it is useful to turn the assert() statements on.

Adding thousands of assert() statements and even more thousands of lines of non-assert debug code is not desirable when testing just one particular aspect of SQLite, or wanting to print just one debugging line in just one function. Sometimes this debug code can interfere with SQLite behaviour in addition to just making it run slowly. While it is important to be able to do global debugging, more often a developer is only working on one thing at a time. There is no need to have an entire test suite run much slower for the sake of observing a single print statement. On resource-constrained targets or where SQLite is part of a much larger and more complicated codebase the addition of debugging code can have unpredictable effects.

As an example of debugging code that doesn't make sense to be always-on even when debugging:

When compiled with SQLITE_DEBUG, SQLite includes routines that will print out various internal parse tree structures as ASCII-art graphs. This can be very useful in a debugging in order to understand the variables that SQLite is working with.

If we are not interested in these ASCII-art graphs, that's unhelpful extra code hanging around. On the other hand, if it is a selectable debug option, it might be reasonable to enhance that feature by adding even more code, perhaps by emitting Pikchr markup.

The goal of this proposal has already been identified as a SQLite need, as can be seen in the SQLIte debugging documentation where of the four existing debug macros, two discriminate based on debugging function:

The SQLITE_ENABLE_SELECTTRACE and SQLITE_ENABLE_WHERETRACE options are not documented in compile-time options document because they are not officially supported.


A common question is whether a project should implement debug levels or debug classes. This proposal addresses both at once.

Given that there is otherwise no debugging discrimination, we have the opportunity to assign comprehensive debug levels or classes and gradually implement them consistently, and leave room for additional classes and levels to be added in the future. Done well, this will improve the speed and quality of debugging cycles, and also make it easier to assist SQLite developers by asking domain-specific debugging questions. It will encourage better quality thinking about the debugging process, and be more compatible with the idea of SQLite as a small, efficient embedded library.

Potential Problems

  • shipping debug: A better debug system might tempt some developers to ship some degree of debugging enabled by default in production. This would break the idea of debugging as a developer safespace, and potentially expose end users to unexpected behaviour. New SQLite debugging features need to be strongly documented as "unsupported for production use in all contexts."

  • Booleans vs. bitmaps: This proposal uses boolean macros rather than bitmaps, except for DEBUG_LEVEL which is a decimal integer. Bitmaps would look like:

    #define DEBUG_COMMANDLINE 0x00000004
    #define DEBUG_PRAGMA 0x00000008
    etc. Bitmaps have the classical advantage of being able to be specify multiple debugging classes/levels in a single number provided at compile-time, however that can only guarantee 31 separate numbers as any more may break on 32-bit processors due to the sign bit. Furthermore there are four kinds of endianness and again this might break debugging on some architectures.

  • Bitmaps vs. booleans: Using boolean macros means that, say 4 debug classes plus the mandatory SQLITE_SELECTIVE_DEBUG and likely DEBUG_LEVEL, and possible SQLITE_DEBUG makes for an extremely long $CC invocation line. But this is much less likely to obscurely break the debug system than architecture/bitmap clashes. Even though we need lots more -D / #define statements.

How to Use the Following Code

compile sqlitedebug with $CC parameters as follows, then run it.

some combinations will halt compilation with an error, eg

    -D DEBUG_LEVEL=1                     
                        (no SQLITE_SELECTIVE_DEBUG)
                        (debug level out of range)
                        (no SQLITE_SELECTIVE_DEBUG)

Implementation of the debug helper functions include function name and line number.


Agree on a larger initial/better category list

C CODE FOLLOWS in next posting in this thread

06:48 Reply: Draft Doc on SQLite SAVEPOINT, and its internal implementation (artifact: de5a006a0d user: danshearer)

Wout Mertens (wmertens) on 2020-09-30 10:14:39:

Upon reading this, I'm missing an explicit pointing out of the surprising difference that is promised in the beginning, and the standard definition that other database engines are implementing instead.

I found it cool that SQLite users can decide never to use BEGIN/COMMIT|END again, unless they want IMMEDIATE (==EXCLUSIVE in WAL mode). I suppose if I had been a SAVEPOINT user for years I would have been less surprised, but my introduction to SAVEPOINT was reading the code in src/pager.c .

By allowing all transactions to be (a) named and (b) nestable without changing syntax depending on context, then SQLite is enabling an improvement in SQL code that does not exist elsewhere and I think that is good.

MS SQL Server does allow BEGIN/END to be nested (as SQLite easily could, but chooses not to). All other databases I tried or whose documentation I read implement nested transactions according to the SQL standard, which means only by means of SAVEPOINT within a BEGIN block. That means the SQL developer has a context-dependent change in syntax, and cannot name the outer transaction.

Dan Shearer

16:38 Reply: Results from callcatcher run on SQLite (artifact: f058216b2f user: danshearer)

I had overlooked this:


around sqlite3_global_recover(). So that presumably is the careful SQLite way for code to quietly grow old and die, just enclose it in OMIT_DEPRECATED and wait for a few years.


14:27 Post: Draft Doc on SQLite SAVEPOINT, and its internal implementation (artifact: 73c1a2b3de user: danshearer)

Savepoints in SQLite

This contains preliminary analysis and some surprising (but consistent, useful and mostly-documented) behaviour of SQLite relating to savepoints. When this document is more comprehensive and accurate it can be added to the tree somewhere. I have been working through the transaction code in SQLite with Uilbeheist, and all contributions welcomed.

SAVEPOINT statement in SQLite vs the standard

The SQL standard defines savepoints as named sub-transactions. sqlite3 extends both SQL standard transactions and savepoints to be a superset of both. An SQLite BEGIN/COMMIT transaction is a special un-named case of a savepoint, and a named saveppoint outside a BEGIN/COMMIT has an implied BEGIN. A savepoint cannot ever be followed by a BEGIN because there can only be one open main transaction at once, and a BEGIN always marks the start of a main transaction.

A savepoint is most often used as a subtransaction.

This above context helps understand the SQLite SAVEPOINT documentation which says:

SAVEPOINTs are a method of creating transactions, similar to BEGIN and COMMIT, except that the SAVEPOINT and RELEASE commands are named and may be nested.

Other implementations of SQL stick to the standard definition, with MariaDB, Postgresql, Microsoft SQL Server and Oracle Server seeming to be more or less identical.

MariaDB can seem as if it behaves like SQLite, but that is only due to it being silent rather than throwing an error when a savepoint is used outside BEGIN/COMMIT. From the MariaDB documentation: "if SAVEPOINT is issued and no transaction was started, no error is reported but no savepoint is created". In fact MariaDB behaves like the strict standard and the other implementations.

Savepoints in SQLite Code

Internal terminology: Where savepoints are not used within a standard transaction, source code comments call it a "transaction savepoint". Similarly an internal name for a standard BEGIN/COMMIT transaction is "anonymous savepoint" while a "non-transaction savepoint" is the usual kind that follows a BEGIN.

vdbe.c maintains the struct Savepoint declared in sqliteInt.h, while pager.c maintains an array of struct PagerSavepoint. These parallel structures all come down to the same objects on disk.


The opcode OP_Savepoint is the only relevant code in vdbe, which has some savepoint logic and calls btree.c/sqlite3BtreeSavepoint(). vdbe deals with the savepoints names and assigns each a sequence number.


btree.c implments sqlite3BtreeSavepoint() which uses sqlite3PagerOpenSavepoint() to do the work. There is not much savepoint logic in btree.c however it is btree.c that implements transactions and subtransactions. (Subtransactions map onto subjournals but btree.c doesn't know anything about them.)


Savepoint logic is mostly implemented in pager.c, by manipulating the objects in the Pager.aSavepoint[] array . pager.c has the complete implementation of sub-journals, which are maintained to match savepoint nesting. pager.c does not know about savepoint names, only the sequence numbers vdbe.c assigned. It is pager code that does the actual rollback to the correct savepoint, no other code is involved in this.

Note: Savepoint code in pager.c seems to be quite intertwined with journal states, but very little difference between using WALs or not. pagerUseWal() and the aWalData[] array seem hardly used suggesting that savepoint implications for WAL mode are little different from the others, which seems a little surprising.

09:32 Delete reply: Debugging opcodes in vdbe.c (artifact: a8b989305d user: danshearer)
09:32 Reply: Debugging opcodes in vdbe.c (artifact: cd05ed3683 user: danshearer)

Rowan Worth (sqweek) wrote on 2020-09-25 08:58:49:

The default behaviour depends on the tool, but for those which use glibc's stdio (FILE* and friends) it goes like this:

That's really helpful and I can see I'll be using stdbuf. I didn't know it existed, and I see it's in GNU coreutils and has been re-implemented for BSD so that's most of the platforms I use.

It isn't a solution for all the platforms SQLite supports though, and it does require some extra knowledge, which is why I feel that using sqlite3DebugPrintf() consistently throughout will improve SQLite.


09:28 Reply: Debugging opcodes in vdbe.c (artifact: 95b5e6bb1f user: danshearer)

Rowan Worth (sqweek) wrote on 2020-09-25 08:58:49:

The default behaviour depends on the tool, but for those which use glibc's stdio (FILE* and friends) it goes like this:

That's really helpful and I can see I'll be using stdbuf. I didn't know it existed, and I see it's in GNU coreutils and has been re-implemented for BSD so that's most of the platforms I use.

It isn't a solution for all the platforms SQLite supports though, and it does require some extra knowledge, which is why I feel that using sqlite3DebugPrintf() consistently throughout will improve SQLite.


09:20 Post: Results from callcatcher run on SQLite (artifact: 6d79786f42 user: danshearer)

While learning how debugging works under SQLite, I briefly wondered if there might be lots of unused code. There isn't, but to check I ran callcatcher on trunk, and that might be interesting to others.

callcatcher reports 4 unused functions, which although correct doesn't mean they can all be removed:

export CC="callcatcher gcc" export AR="callarchive ar" ./configure && make callanalyse ./sqlite3 sqlite3MemTraceDeactivate sqlite3_global_recover sqlite3_memory_alarm sqlite3_shutdown

Ignoring calls in test*c, look at the code I see:

ext/misc/memtrace.c:sqlite3MemTraceDeactivate() is not called regardless of SQLITE_DEBUG, although I don't yet understand memory tracing. So maybe it can be removed.

main.c/sqlite3_global_recover() is unused. loadext.c has a comment saying sqlite3_global_recover() is deprecated, but I see it isn't in the public SQLite interface. I think it can be removed.

malloc.c/sqlite3_memory_alarm() is unused, however it is an SQLITE_DEPRECATED function in the public interface in sqlite.h.in. This means it cannot be removed.

main.c/sqlite3_shutdown() is almost unused. It is discussed in comments, and if -DTCLSH is specified, it is called at the end of tclsqlite.c, but I can't see why it is specially needed there. So it can probably be removed?


08:10 Reply: Debugging opcodes in vdbe.c (artifact: 1772cb4a37 user: danshearer)

Dan Kennedy (dan) on 2020-09-22 13:40:24:

Probably just buffering. Try:

fprintf(stdout, "format", ...); fflush(stdout);

Yes you were right. The buffering has some very unhelpful effects during debugging, differing with the kinds of redirection and number of processes involved and giving the wrong impression with timing, as detailed at the end of this posting. I was additionally confused because plain unbuffered printfs exist in SQLite source code and I assumed that meant they would work, but they exhibit the same buffering weirdness.

That means the best advice currently for adding debugging statements to SQLite is:

  1. add fflush to any printf statements like you said, or
  2. switch on -DDEBUG_SQLITE and use printf.c/sqlite3DebugPrintf(), which has fflush and other good features already

There are a bit over 100 plain printfs in SQLite, most of them within #define DEBUG_SQLITE blocks. So presumably these should all be changed to sqlite3DebugPrintf(). That function doesn't just flush the buffer, it does other things including address a portability problem. Just generally it seems good for all debugging to go through a single choke point where it can be tee-d to a file or whatever else is needed during debugging.

It would also be helpful if sqlite3DebugPrintf() was available in non-SQLITE_DEBUG mode for quick and specific debugging. Would it cause much pain to remove the SQLITE_DEBUG around that function?

As to the confusing behaviour around buffered printfs (and thanks to Uilebheist for help with the testing here):

  • while stdout buffering can cause printfs to be emitted at unexpected places they will however definitely be printed. It's just easy to miss the odd line if you have switched on a large amount of output, especially in 'make test'.
  • because of buffering, the results will differ between runs using '>' shell redirection and with '| tee', which also differ between running 'make test' and in directly running the sqlite binary.
  • stdio is always flushed by the OS when a process exits. These different ways of running SQLite introduce different exit events, which changes when the printfs are displayed.
  • to add to the confusion, the 'script' command flushes the line buffer so that the printfs happen exactly when expected, so running the above tests under script can make the problem go away altogether.

Things that will not work to fix the problem:

  • Making stdout unbuffered with setbuf(stdout, NULL), because that has interactions with redirections too.
  • Using stderr instead of stdout. POSIX requires that stderr be unbuffered so that's great, but that doesn't change the fact that SQLite has unbuffered printfs, and it works differently in redirections.
  • suspecting 'make test' - no, TCL does not grab the terminal, make test is innocent :-)


18:42 Reply: Debugging opcodes in vdbe.c (artifact: 88558f6cfe user: danshearer)

In reply to Dan Kennedy (dan) on 2020-09-22 13:40:24:

That does make a difference, and I'm re-running my tests to be sure.

Assuming it does, there is a handful of printfs in vdbe*c that would seem to be similarly vulnerable to OS and other local variability.



12:06 Post: Debugging opcodes in vdbe.c (artifact: 801a4ed773 user: danshearer)

I'd like to develop a process for examining what functions call and are called by a particular OpCode. To do this I need a reliable way of adding debug statements, and I have found some surprises in that.

The example I picked is the opcode IsSmaller.

From code inspection using grep and less, IsSmaller appears to only be triggered by "PRAGMA optimize", which in turn is somewhat exercised by the script test/busy.test. I used various debugging techniques to learn about how IsSmaller is used:

  1. I compiled with and without -DSQLITE_DEBUG
  2. I added some printfs to the implementation of IsSmaller in vdbe.c
  3. I made these printfs inside and outside #ifdef SQLITE_DEBUG
  4. I added PRAGMA vdbe_trace=ON/OFF at the beginning and end of busy.test
  5. I tried turning on debug mode for PRAGMA optimize with mask=0x0001
  6. Instead of printfs in vdbe.c, I appended debugging statements to a file
  7. I tried running the sqlite commandline as well as make test to trigger my debug statements inside IsSmaller

Note: When editing vdbe.c, it is important to strictly respect formatting because the whole C file including comments is parsed to generate code before the C compiler sees it.

Fundamental question: why does a printf statement inside an opcode seem to only sometimes work? Appending to a file always works, which is how I can compare. It is possible there are so many moving parts that I just got confused.

A subquestion: does the make test infrastructure capture and process stdout before resending it to stdout?

Opening and appending to a file does work reliably, which is how I know there are exactly 8 invocations of IsSmaller caused by operations in make test. Sometimes the printfs agree with this, and sometimes they don't appear at all. At least once only a few of the printfs are displayed.

I thought maybe vdbe_trace was the answer, but fifty thousand debug lines later, I began to see that ad-hoc printf's have some advantages. On the hand, when using vdbe_trace, printfs always seem to work, whether inside make test or not.

(As expected, adding r in front of the n at the end of printf statements makes no difference but I did try.)

I don't think I'm the first to have noticed strange behaviour of this kind. I'd like to work out what it is and document it.