Session extension issues, quirks, and doc

(1) By Roger Binns (rogerbinns) on 2025-05-12 18:48:23 [source]

This is my feedback about the session extension in the context of wrapping it for Python users.

General

Wrapping the extension was straight forward and the session API worked really well for that. The streaming API functions were especially nice, and I didn't have to repeat code.

As with other SQLite APIs, problems arise when I need callback into Python code, and that Python could error, but the SQLite API provides no way to report the error. The only two APIs this happens for is the table filter of the session and table filter of apply. In both these cases I have to convert the SQLite supplied strings into Python objects which could fail, and the callback code could error, but I can't report an error.

Issues

Empty data provided to conflict handler

In that same issue, behaviour is a little confusing. Returning SQLITE_CHANGESET_REPLACE results in the apply returning SQLITE_CONSTRAINT and no changes. Returning SQLITE_CHANGESET_OMIT results in the row being deleted! I would expect "The change that caused the conflict is not applied".

The internal method sessionGenerateChangeset makes a savepoint before getting the database mutex. It should be the other way around. All the other mutex calls are correct in my review.

Quirks

Streaming output doesn't obey SQLITE_SESSION_CONFIG_STRMSIZE (default 1KB). Affected rows are output an entire row at a time, so if a row had a 1GB value, you'll get a GB of data at once.

sqlite3session_table_filter and sqlite3session_attach are a bit messy. For example the former accepts a NULL pointer but sets the internal bAutoAttach flag to true. That results in all tables being accepted, when the expected intent is to remove the filter..

Pathological case: If using SQLITE_SESSION_OBJCONFIG_ROWID and the table has a column named _rowid_ then generated changesets will contain the wrong operations and data. Perhaps reject those tables?

There are many places where SQLITE_ERROR, SQLITE_MISUSE, or SQLITE_SCHEMA can be returned, but there is no further way to diagnose what happened. sqlite3_log is used 3 times in the body of apply, and would be a good choice for all the other locations,

The session table filter is called for tables that won't be recorded because they have no primary key. It gets frustrating because your prints show you said yes, but the session ignored the table anyway! Another sqlite3_log opportunity.

The rebaser has been available for 6 years, and been unchanged in all that time as far as I can tell. Even though it is marked as experimental, I chose to wrap it. Perhaps it is time to remove the experimental label?

Inverting a patchset gives SQLITE_CORRUPT (which is documented) but not everyone has memorized every sentence. There is no documented way to tell if something is a changeset or a patchset, although I think examining the first byte will do. Another sqlite3_log opportunity.

My fault and a misreading of the code: The changegroup doc mentions various reasons for how it can end in an undefined state (eg adding a corrupt changeset, run out of memory). The first member of the C structure is rc so I sort of deduced you continue get that error code for all successive calls (except close). ie once a changegroup enters the undefined state it remains that way. That isn't the case - it will always be coherent, but will only contain parts of the data provided as the error occurred. It would be nice if the error state was persistent. Without that it is a burden on all wrappers to separately track that undefined/error/state, and is best done once in the session code.

Documentation

Please add a toc.db like there is for the main documentation. I had to hand craft my own. I use the content to provide links from my documentation to the corresponding SQLite documentation, wrap all of the defined constants, and similar.

Some of the page titles end in a trailing full stop, and some don't.

In a few places it is called sessions (plural) while most locations are singular.

The extension assumes that any prior hook returned by sqlite3_preupdate_hook belongs to another session instance, and will corrupt/crash the process if that is not the case. The documentation says this is "not possible" and "undefined" but should use stronger text. The preupdate page could also warn about this interaction with session.

Foreign keys can be confusing, especially as the extension doesn't check changes happened anyway when doing an apply. For example ON DELETE CASCADE can result in a deletion that the changeset also contains. Perhaps a "Best Practises" section of documentation advising what to do, and not to do in terms of structuring schema for most effective use of session?

sqlite3changegroup_schema is not mentioned on the changegroup page

sqlite3changegroup_schema doc doesn't mention that you can get SQLITE_MISUSE. Its implementation does mention in a comment when that happens.

start_v2 flags page has wrong name in bottom bit.

For the conflict handler callback, it would be really nice if there was a table of the conflict flag and allowed return value. Currently if you return a wrong value such as SQLITE_CHANGESET_REPLACE for SQLITE_CHANGESET_NOTFOUND, the apply fails with SQLITE_MISUSE and the user has to guess what they did wrong. If such a table existed I could generate a friendly message instead from the callback return.

There is no documentation on what a rebase as provided by apply_v2 actually is, other than a bag of bytes. For example can it be loaded into a different process, portable between processor architectures, etc? I found out that it is just a changeset, so perhaps document it as that?