SQLite User Forum

Handling of empty, zero-length files
Login

Handling of empty, zero-length files

(1) By Ulrich Telle (utelle) on 2024-06-01 13:13:10 [source]

If sqlite3_open is called with the name of a not-existing file, SQLite will create a new file with the given name and with a file length of 0. If now sqlite3_close is called without doing anything else with the database, SQLite leaves an empty, zero-length file behind.

The problem is that this file can't be identified as a SQLite database file, because it has no SQLite header. Actually, it could be anything. It could be a file a user had created on purpose for a non-SQLite task.

Opening the file again, but this time executing some DDL SQL statements would let SQLite happily create a database file from the existing empty file, although this may not be what a user expects or wants.

Creating a new database file, if there wasn't a file with the given name, is ok, but changing an existing empty file into a database file may not be ok. Due to the missing database header SQLite can't check whether the file is a valid SQLite database file or not.

Is there a reason, why SQLite behaves like that? Is there an easy way to force SQLite to initialize at least the database header on disk?

(2) By Bo Lindbergh (_blgl_) on 2024-06-01 15:00:36 in reply to 1 [link] [source]

Is there an easy way to force SQLite to initialize at least the database header on disk?

Use the application_id pragma.

(3) By Roger Binns (rogerbinns) on 2024-06-01 16:01:02 in reply to 1 [link] [source]

Is there a reason, why SQLite behaves like that?

There are many configurations for the database file like encodings, page sizes, encryption keys, version support, which will affect anyone else opening the file. Those configurations are typically set by pragmas but you can only issue pragmas on an open database.

So SQLite creates the empty file, lets you issue zero or more pragmas, and then on the first write to the database creates the header with the desired configuration.

Is there an easy way to force SQLite to initialize at least the database header on disk?

As Bo said you can use a pragma, and as a bonus mark the database as for your application.

I like to start a transaction, read the current user_version, then set the user_version to that same value, and then commit the transaction.

This double checks I can both read and write the database, and if the file was zero length will write out the header.

(4) By Ulrich Telle (utelle) on 2024-06-01 18:45:38 in reply to 3 [link] [source]

There are many configurations for the database file like encodings, page sizes, encryption keys, version support, which will affect anyone else opening the file. Those configurations are typically set by pragmas but you can only issue pragmas on an open database.

This explains, why SQLite starts with an empty file. And I'm totally fine with that. However, it does not explain, why SQLite leaves an empty file behind on closing the connection. If no pragmas are used, before closing the connection, SQLite could use its default settings and create a valid database file (with the database header and an empty schema) on closing the connection.

IMHO an empty file is not really a valid SQLite database file.

Using pragma application_id or pragma user_version is just a workaround for SQLite failing to create a valid default database on closing a connection.

BTW, in a library function one would always use the current value of the application_id or the user_version.

If I understand it correctly, an application id should be a (globally) unique integer. That is, it has to be registered somewhere by someone to avoid clashes with ids already in use elsewhere.

(5) By Spindrift (spindrift) on 2024-06-01 19:05:32 in reply to 4 [link] [source]

I also find it odd that, on closing, either the empty stub file is deleted or a verifiable header is written to it, leaving an identifiable, if empty, sqlite database.

Having said that, presumably this falls into the "someone, somewhere may be relying on this behaviour rather than calling 'touch'" backwards compatibility argument.

(8) By Code Hz (codehz) on 2024-06-02 04:08:34 in reply to 5 [link] [source]

Deleting the file after closing is probably not a good option considering potential race conditions: Two processes open a file with the same name at the same time. One process exits without doing anything, while the other writes something into it. If you try to delete this empty file (on close), you may accidentally delete the file that contains data.

(9) By Spindrift (spindrift) on 2024-06-02 07:21:31 in reply to 8 [link] [source]

Yes. I imagine there are any number of potential issues.

(10) By Ulrich Telle (utelle) on 2024-06-02 18:53:17 in reply to 8 [link] [source]

Deleting the file after closing is probably not a good option considering potential race conditions: Two processes open a file with the same name at the same time. One process exits without doing anything, while the other writes something into it. If you try to delete this empty file (on close), you may accidentally delete the file that contains data.

That is one more reason why I would prefer that SQLite would initialize the newly created file with a valid SQLite database header, instead of leaving behind an empty file.

Spindrift: [...] presumably this falls into the "someone, somewhere may be relying on this behaviour rather than calling 'touch'" backwards compatibility argument.

Therefore it is unlikely that SQLite will change its behaviour, unfortunately.

(12) By Simon Slavin (slavin) on 2024-06-03 14:54:26 in reply to 10 [link] [source]

Sorry to waste space with a "me too" post but … me too.

I don't like the idea of leaving a zero-length file around, for the above reasons, mostly that file can't identify a filetype. Imagine someone abusing SQLite to make it create files with rude names all over your directory structure, and the files are blank so there's no clue it was done using SQLite.

I would accept either alternative: either delete the empty file when the last connection to it is closed, or give it the standard SQLite header block plus whatever else is needed to make it pass PRAGMA integrity_check; for a database with no content.

It seems that deleting the file is difficult, for reasons nicely-explained above.

(6) By Roger Binns (rogerbinns) on 2024-06-02 00:56:11 in reply to 4 [link] [source]

If I understand it correctly, an application id should be a (globally) unique integer. That is, it has to be registered somewhere by someone to avoid clashes with ids already in use elsewhere.

It doesn't have to be globally unique and require a registration authority. There is no harm if two different applications happen to pick the same number. All you know is that if the application id does not match then it is definitely not yours, and that if they do match it is not a guarantee that the database has the correct schema and contents for your application.

Various operating systems in the past used 4 byte codes to associate type information with files such as MacOS and RiscOS, and it worked out just fine.

That said there is a magic file in the SQLite source where application ids of some SQLite consortium members are listed, and it is no surprise that most are also in the text range.

(11) By Bo Lindbergh (_blgl_) on 2024-06-02 20:45:36 in reply to 4 [link] [source]

Try thinking of an empty file as a valid database whose text encoding hasn't been frozen yet.

(7) By cj (sqlitening) on 2024-06-02 01:33:31 in reply to 1 [link] [source]

A user can be prevented from creating a file/database. 
A zero-length file created by an admin allows user to open an empty database.