SQLite Forum

null character sorts greater than 0x7f and less than 0x80
Login
> If, for internal storage, TCL converts strings with embedded nulls into Modified UTF-8, so it can use text with embedded nulls without needing to do the needed workaround to actually force SQLite to handle the embedded nul in the string, that is TCL's business, and it is TCL's responsibility to convert it on reading.

I agree mostly, but you are only restating what I said earlier except for the "force SQLite" part.  That is where things become interesting and less clear.

> If SQLite doesn't make the change going in, it shouldn't make the change coming out.

That is either assertion of the obvious ("round-tripping should preserve data"), or begging the question.  As I see it, the issue (that we are trying to get at) is whether SQLite (or any other Tcl library which is implemented such that it gets Tcl string objects in Tcl's internal form) should convert that internal form into the external (standard UTF-8) form on output and convert such external form back into the internal form on input.  I think the answer is "clearly yes." 

> It doesn't matter if SQLite has a relationship with TCL, it is not dedicated to TCL, so should not be doing TCLs job for it, or it might break other applications' use of this same encoding trick. Why should SQLite be bent to meet some other language's design intent?

This gets to the crux of the matter, I think.  I am not saying what SQLite should do generally except insofar as the principles [a] extend to other environments. The question here is what should the Tcl library known as sqlite3 do? For efficiency reasons, Tcl nowadays has internal forms rather than treating everything as strings. And also for efficiency reasons, SQLite for Tcl implements interfaces that operate below the ostensible "everything is a string" level, and so naturally those interfaces traffic in the internal forms. I believe that cross-language database compatibility considerations work to make it desirable for the Tcl SQLite library to perform internal/external form conversions.  This has nothing to do with the origin of SQLite as a Tcl library and everything to do with its Tcl library build being a useful and unsurprising Tcl library.


[a. I do think that a SQLite library for a language where strings were represented as length-prefixed UTF-16 **should** translate those to/from UTF-8 where that is the chosen DB text encoding. ]

For backward compatibility reasons, if this issue is remedied in (the Tcl library form of) SQLite, it will probably require an option of some kind for the UTF-8 collation.  Perhaps it should be named "Want_Standard_UTF-8".