SQLite Forum

Blob vs. Text for primary keys
Login

Blob vs. Text for primary keys

(1) By Michael Hofmann (mthbgs) on 2021-03-04 15:02:36 [link] [source]

Hello,

I am working on an application that needs to store a large number of binary blobs, each one indexed by its SHA256 hash. The basic idea is to store blobs like this:

CREATE TABLE blob (
  hash BLOB PRIMARY KEY,
  content BLOB NOT NULL
);

Here, each entry in the "hash" column would entail 32 bytes containing the raw SHA256 hash. Alternatively, I've been thinking about storing a hex representation of the hash:

CREATE TABLE blob (
  hash TEXT PRIMARY KEY,
  content BLOB NOT NULL
);

I have experimented with both variants, and so far it seems the only difference is, as expected, a somewhat larger database size for the second version. What I would like to know: Is there is any functional difference, in particular with regard to performance, between BLOB and TEXT as a primary key?

(2) By Richard Hipp (drh) on 2021-03-04 16:00:22 in reply to 1 [source]

Both approaches should work fine. Storing the hash as a BLOB might be very slightly faster, since (as you observe) there is less content, hence less file I/O.

The Fossil version control system does something very much like this. But it stores the hash as text rather than as a blob. Performance is not an issue, and text is easier for developers to deal with when debugging.

(3) By Michael Hofmann (mthbgs) on 2021-03-04 18:45:33 in reply to 2 [link] [source]

Thank you for the clarification, that was very helpful.

And thank you for the link to the technical overview. My own use case is quite different from a VCS, but its a fascinating read nevertheless.