SQLite Forum

Feedback on writing a SQLite VFS for a distributed object store
Login
Hello,

I've recently completed work on a new SQLite VFS [1,2] for the Ceph distributed storage system. I thought I'd share some feedback on my experience writing the VFS:

- The read API is unfortunately synchronous: every read requires us to wait for the read result from an object storage device (OSD). It would be better if SQLite could initiate every or as-many-as-can-fit-in-page-cache reads it requires to execute a query before using the results. This would also avoid the potential inefficiencies of bolting on readahead to improve matters at the VFS layer.

- Loading a VFS from a static/compile-time-linked-dynamic library requires awkward API gymnastics [3]. Surely this could be simplified?

- Some VFS members do not seem to be required but this is not documented. For example, xRandomness/xGetLastError appear to not be required. On the other hand, xCurrentTime is required (SIGSEGV if not present). I don't really see a reason for SQLite to require every VFS to supply xCurrentTime if a SQLite-chosen default is acceptable.

- xFullPathname could be improved to note it is called once for each attach of a database. Its purpose seems to be the canonicalization of the database file name?

- Temporary tables lack any "context" information about the database they are associated with. There was a pragma "temp_store_directory" which could have been used but that is deprecated. For a distributed storage solution, we need some kind of context about where to store transient tables (which storage pool, namely). Right now, we force the user to put temporary tables in memory [4].

- The xSectorSize appears to have no effect on the size of reads/writes performed on the database. We have been advising the use of page_cache pragmas instead [5].

- Journal persistence (PERSIST) is a great optimization for this VFS. It would be useful if we could hint this as a new default without requiring it as a pragma for every db connection.
- It would be fantastic to run some of SQLite's tests in our own upstream CI but using an alternate VFS appears to be a non-trivial change for all open source tests I examined.

Except for these issues, the new VFS works pretty well. Thank you for your work on SQLite!

[1] https://ceph.io/community/new-in-pacific-sql-on-ceph/
[2] https://docs.ceph.com/en/pacific/rados/api/libcephsqlite/
[3] https://github.com/ceph/ceph/blob/36ba4d71d04493c7742d5f6cf38e0d17dcb5a9e1/src/include/libcephsqlite.h#L33-L56
[4] https://docs.ceph.com/en/pacific/rados/api/libcephsqlite/#temporary-tables
[5] https://docs.ceph.com/en/pacific/rados/api/libcephsqlite/#page-size