SQLite Forum

null character sorts greater than 0x7f and less than 0x80
Login
Personally, I sort of feel the opposite. CESU-8 exists because some platforms adopted UCS-2 when it was a thing, and when Unicode expanded past 16 bits, and UCS-2 sort of morphed into UTF-16, applications that still were thinking UcS-2 would generate CESU-8 for non-BMP characters, seeing them as a funny thing built out of two of the things it thought of as characters. Use of CESU-8 is less likely to cause a security issue for an application that generates CESU-8, as such an application will likely be processing internally as UTF-16, and the sort of issues with CESU-8 (that code points have multiple spellings) would tend not to do much processing on the stream at the CESU-8/UTF-8 encoded phase. Applications that process as UTF-8, would be advised to either reject CESU-8 or clean it up very early in processing.

Modified UTF-8 on the other hand, comes about from wanting to have embedded nuls in strings that are defined as nul terminated. The problem here is that ay step that converts the format to UCS-32 or normalizes the string, will suddenly find the string shortened. Any string that is capable of holding a nul character should not be stored as a nul terminated string but as a counted string. Within an application, if you have defined that you are going to be using the technique is one thing, but then using this method on an external data that isn't defined to do so is asking for problems.

It should be noted, that in this case, SQLite is designed to be able to handle strings with embedded nuls, in them, you just need to be providing it with the full length of the string, This means that SQLite doesn't need this sort of trickery if the applications that use it support strings with embedded nuls, and use the right API for them, and it also supports the use of Modified UTF-8 for those applications that want to handle the issue that way, as SQLite internally processes strings as having a specified length, it just allows applications to pass strings to it that are nul terminated and ask SQLite to figure out their length.