sqlite3_value_encoding usage

(1) By anonymous on 2022-11-17 13:34:20 [source]

Hi,

SQLite Release 3.40.0 On 2022-11-16

5. Add the sqlite3_value_encoding() interface.

Could you please clarify how to use it properly?

I have a db that contains both UTF8 and UTF16 strings for historical reasons and an app that normally works with the db through the UTF8 interface, but can also consume UTF16 strings natively. The new interface looks like a low-hanging optimization opportunity, e.g.

static_assert(std::endian::native == std::endian::little)

if (sqlite3_value_encoding(sqlite3_column_value(stmt, col)) == SQLITE_UTF16LE)
    // use sqlite3_column_text16
else
    // use sqlite3_column_text

However, the object returned by sqlite3_column_value() is an unprotected sqlite3_value object that may only be used as an argument to sqlite3_result_value(), sqlite3_bind_value(), and sqlite3_value_dup().

So, is it safe to pass it to sqlite3_value_encoding or do I need to lock the mutex first or do something entirely different?

Thanks

(2) By Richard Hipp (drh) on 2022-11-17 13:46:27 in reply to 1 [link] [source]

Could you please clarify how to use it properly?

The proper usage for this routine is to not use it at all. I remember having written text to explain this in the documentation, but apparently I never committed that change, as it is not in the current documentation. I will strive to correct this oversight for the next release.

The new API was added to facilitate testing and debugging. If you have been able to get by without this routine before, it provides you with no new capabilities.

All strings in a single SQLite database are stored using one of three encodings: UTF8, UTF16LE, and UTF16BE. All strings in the same database file use the same encoding. No exceptions. You do not have a (well-formed) SQLite database file that contains strings of different encodings.

That said, perhaps you are storing your strings as BLOB values. In that case, SQLite treats them purely as binary and never worries about the encoding so this is a non-issue.

Or, perhaps you are inserting strings using various encodings. In that case, SQLite will automatically convert the strings to the database encoding as they are inserted.

When you want to extract a string, you specify the encoding you want by the choice of access routine: sqlite3_column_text() or sqlite3_column_text16(). SQLite will automatically make any necessary conversions.

The sqlite3_value_encoding() interface is a way for testing code to determine the (normally hidden) internal state of an sqlite_value object. If you are not writing testing and debugging logic for SQLite, then you don't need to know this and you should not use the interface.

(3) By Richard Hipp (drh) on 2022-11-17 14:00:47 in reply to 2 [link] [source]

The documentation for the sqlite3_value_encoding() function is extracted from comments in the source code. So I cannot update the documentation without doing a whole new release. The comments have now been changed, but unfortunately the on-line documentation cannot be updated until the next release.

(4) By anonymous on 2022-11-17 14:20:25 in reply to 2 [link] [source]

All strings in the same database file use the same encoding. No exceptions. You do not have a (well-formed) SQLite database file that contains strings of different encodings.

Sorry, I mean various instances of the database could be stored as UTF16 or UTF8 of course.

Thanks for the explanation. Indeed, if the interface should not be used under normal circumstances, it would be nice to document that explicitly.

(5) By anonymous on 2022-11-17 20:35:11 in reply to 2 [link] [source]

Extension functions can distinguish between UTF-8 and UTF-16 by registering themselves twice. Why should other code not be allowed to make the same distinction?