SQLite Forum

null character sorts greater than 0x7f and less than 0x80
Login
If I have interpreted the cited Wikipedia article assertions correctly, Tcl uses the *overlong* null character representation internally but the canonical, single-byte representation for external data.

> Maybe you could say that TCL is using "Modified CESU-8".

I might say that, if it was true. But it is not.

At least on Windows, Tcl 8.6 uses the overlong null representation in string variables for the \\x00 character but writes it out (via puts) using the single-byte representation conforming to **unmodified** CESU-8. I have demonstrated this with [Magicsplat Tcl/Tk for Windows](https://www.magicsplat.com/tcl-installer/index.html), version 8.6.10, with stdout redirected to a file to avoid console weirdness.

I maintain that, as an alternative way of storing or retrieving "external data", SQLite should do the same translation between Modified UTF-8 internal data and either (plain) CESU-8 which uses the short, canonical null representation or standard UTF-8 which also uses the short representation for external data. [a]

[a. The difference between CESU-8 and UTF-8 lies in how code points beyond 0x00 to 0x7F are represented; it is immaterial to this discussion. ]

This would, in my opinion, make SQLite adhere more closely to Tcl's design intention.  I suspect that the overlong representation is used because it simplifies Tcl's internal operations. If the representation never escapes the internal representation (except by purposeful, debug-like commands), then it is purely an implementation detail.  I think the OP was rightfully surprised.