SQLite Forum

Verifying schema for application file format

Verifying schema for application file format

(1) By anonymous on 2021-03-09 15:35:45 [link]

I'm interested in using SQLite as the data store for a desktop application. When sharing application files, in this case SQLite DBs, how can I verify the schema of an untrusted DB matches what I expect from the schema version?

Placing constraints on columns is great both to catch application bugs and document the specifics of the format, but I can't rely on them, if an untrusted DB could exclude them from its schema. Even if I implement all the same constraints in application code, other operations could surely fail if the schema is not as expected. It'd be great to be able to fail with "invalid file" when the DB is first loaded if the schema is wrong.

The most obvious option would seem to be the sqlite_schema table - is this the definitive source of the schema, or could it be subverted by a malicious DB? Would subversion be caught by an integrity_check? Alternatively, is there some way I could verify the schemas of two DBs - an untrusted one and a temporary :memory: DB created as a template - are identical?

I hope my reasoning makes sense, thanks in advance for any assistance.

(2) By anonymous on 2021-03-09 16:30:18 in reply to 1

If you have the *good* schema on hand, you could leverage [SQLite's diff tool](1) perhaps? And its `--schema` option?

[1]: https://sqlite.org/sqldiff.html

(3) By Richard Hipp (drh) on 2021-03-09 17:18:36 in reply to 1 [link]

The [Fossil version control system][1] (used by SQLite itself) uses SQLite
as a file format, and has done so for over 13 years now, and the issue of
a valid schema has never come up.  So, unless you have an unusual situation,
I think you are worrying about a non-issue.  Yes, somebody could go in a
mess with the schema of your file format, which could cause grief for your
application.  But anybody with permission to do that likely already has
full access to your machine, so they do not have any new capabilities for

[1]: https://fossil-scm.org/home/doc/trunk/www/index.wiki

Perhaps you are worried about receiving a file over the internet from
an unknown or untrusted source and submitting that file to your application?
In this case you should take precautions as outlined in the
[Defense Against Dark Arts][2] document.

[2]: https://www.sqlite.org/security.html

If you really, really want to check the schema, you can do so by
examining the sqlite_schema table.  The entire database schema is
contained there.  When SQLite is starting up and needs to read in the
database schema, it just runs "SELECT * FROM sqlite_schema".  So if you
do the same in your application, you are seeing exactly what SQLite sees.

(4) By anonymous on 2021-03-09 17:47:15 in reply to 3 [link]

Thanks for the quick response.

I take your point about it maybe being a non-issue, I wondered as much myself.

The possible scenarios I'm considering here are:

1. Untrusted file from third-party lacks constraints that are normally expected, this file is now more vulnerable to application bugs until a potential future migration happens to fix it up, and might even have rows breaking those expected constraints.
2. Untrusted file from third-party lacks tables/columns that are normally expected, at some point these might be needed and will error.

To mitigate the former I can reimplement all the constraints in the application, but I suppose the ideal case for both is to detect something is amiss sooner rather than later, before the user has built more work on top of the untrusted file. The schema table looks like the best way of making absolutely sure.

Thanks for your time.

(5) By Richard Damon (RichardDamon) on 2021-03-09 18:43:01 in reply to 4 [link]

And one question comes up as to what is the threat model that such a vulnerability is an issue. So they provide your program with a bogus file and you corrupt it, is that really a problem.

Now, if you merge that data with some other dataset, maybe you have more of an issue, but is that really worse than just merging untrusted data.

Now, if the bad file can actually crash a system that is also at the same time processing good data, then that might be more of an issue.

If the concern is the user wasting time with a bad file, how apt are users to get such a file? Maybe the simple test is to check the schema and warn them if it doesn't match a known schema that the application should have been able to generate.

(6) By Keith Medcalf (kmedcalf) on 2021-03-09 19:04:57 in reply to 4 [link]

If you are dealing with "your application" and databases which should have a "known schema", the easiest way to guarantee integrity would be to retrieve the schema in some predictable order and compute a checksum/CRC/hash over it and see if that is what you expect.  For example:

select aggsha3_512(sql) 
  from (
          select sql 
            from sqlite_master 
        order by sql

where aggsha3_512 is a custom aggregate function that (in my case) computes the sha3 512 bit hash over the input.  If the result is what you expect, then the schema contains what you expect since the input order is deterministic.  If it does not match what should be there, then the schema has been tampered with.

Note that since you are doing a sort, changes in the order of the schema table entries will be irrelevant.

(7) By ddevienne on 2021-03-10 08:15:11 in reply to 6 [link]

Also look into [SQlite's own DB hash utility][1], to check the whole DB.

[1]: https://sqlite.org/dbhash.html

(8) By Simon Slavin (slavin) on 2021-03-10 13:35:26 in reply to 4 [link]

SQLite allows considerable freedom in statements.  Your <code>CREATE TABLE</code> statements can contain different spacing characters and even comments.  So your answer depends what you want to do.

Checking that the schema (the <code>CREATE TABLE</code> statements used) are *exactly* what you used is easy.  You execute the

<code>SELECT sql FROM sqlite_master ORDER BY sql</code>

statement given earlier, and check to see that it returns exactly what you expect.  On the other hand, checking that the database has *compatible* table definitions, which will not return errors when you use them can be done only extensively: run a test sequence of example SQL commands and check to see it doesn't return any errors.