sqlite3_deserialize documentation does not clearly state buffer lifetime requirements
(1) By Austin (austin_launchbadge) on 2023-09-27 20:09:28 [source]
While reviewing a PR to add sqlite3_deserialize()
support to SQLx, I realized that the deserialize functionality assumes the pData
pointer remains valid for the lifetime of the connection, but this is not clearly stated in the documentation: https://github.com/launchbadge/sqlx/pull/2759#discussion_r1333725840
To be fair, it is pretty heavily implied, but it's clearly pretty easy to miss, especially if someone is not used to thinking about memory safety. This implication is also in the context of when the SQLITE_DESERIALIZE_READONLY bit is not set, which could lead the reader to wrongly assume that the buffer lifetime requirements do not apply to read-only databases.
It couldn't hurt to be especially clear about this, e.g.:
The buffer P will be used as the backing memory-store for the lifetime of the connection. It should not be invalidated, e.g. by reuse of the buffer or by an external call to free() or realloc(), before the connection is closed, which could otherwise lead to a segmentation fault, memory safety vulnerabilities or other undesirable behavior.
The design of this API also seems to imply that the contents of the buffer when the connection is closed will be equivalent to the result of calling sqlite3_serialize() on the connection, but that is not explicitly stated, which I feel undersells its utility. If that's the intended functionality, it's worth calling it out.
(2) By Larry Brasfield (larrybr) on 2023-09-27 23:32:25 in reply to 1 [link] [source]
You raise an interesting question, or set of questions. I may see the issues differently though, so I will state them clearly.
As I read the sqlite3_serialize() API doc, I see nothing explicit about the validity timespan of the pointer it returns. And I agree that more should be said in the doc on this.
However, I find your proposed statement unlikely to be true, and I see no mention of the effect of the SQLITE_SERIALIZE_NOCOPY flag.
If that flag is set (in the mFlags argument in the API call), then the returned buffer remains valid until the user passes its address to sqlite3_free(). Whether that timespan exceeds the connection timespan is irrelevant.
If that flag is clear (in mFlags), all that is guaranteed, if a non-NULL pointer is returned, is that it is valid upon the API return. The doc says no more, and I expect this is by design.
A user might well wonder, "How much longer after the call is that pointer pointing to anything useful and related to the *piSize value also output from the API?"
That question should be answered, if for no other reason than to head off assumptions such as you seem to have made. (I do not mean to be unduly critical here; it is natural to assume the API was meant to be useful, and that there must be some useful pointer lifetime.)
The answer, which will go into the API doc soon (unless Dan or Richard surprise me by saying otherwise), is that the pointer is valid just until that sqlite3 object is passed, directly or indirectly (via its sub-object(s)), into some other SQLite API.
(3) By Austin (austin_launchbadge) on 2023-09-27 23:58:40 in reply to 2 [link] [source]
I'm not talking about using the pointer returned by sqlite3_serialize(), I'm talking about the potential issues with passing a user-managed buffer to sqlite3_deserialize().
In the PR I linked, the user wanted to use a previously serialized database that gets embedded into their application binary at build time. The API design they came up with, however, allows passing a slice (a Rust primitive for pointer+length) with any lifetime, which could be pointing to any dynamically allocated buffer or even one existing on the stack.
Since the design of sqlite3_deserialize() appears to assume that the buffer pointer will be valid for the lifetime of the connection, this could easily lead to, e.g. a use-after-free scenario.
The documentation of sqlite3_deserialize() should make it clear that, no matter the source of the pointer, it must remain valid for the lifetime of the connection.
(4) By Larry Brasfield (larrybr) on 2023-09-28 00:19:35 in reply to 3 [link] [source]
The documentation of sqlite3_deserialize() should make it clear that, no matter the source of the pointer, it must remain valid for the lifetime of the connection.
Yes. I would go further. Once passed into sqlite3_deserialize(), the buffer should be modified only by SQLite until the connection is closed.
(5) By Larry Brasfield (larrybr) on 2023-09-28 15:21:58 in reply to 1 [link] [source]
FYI: Your comments have instigated some doc changes. Further comments are welcome.
The design of this API also seems to imply that the contents of the buffer when the connection is closed will be equivalent to the result of calling sqlite3_serialize() on the connection, but that is not explicitly stated, which I feel undersells its utility. If that's the intended functionality, it's worth calling it out.
A call to sqlite3_serialize can be effected very inexpensively with the SQLITE_SERIALIZE_NOCOPY flag set when that utility is available. I think using the API is to be preferred over just reusing the deserialize buffer based on peering into the (likely) implementation.