SQLite Forum

null character sorts greater than 0x7f and less than 0x80
Login
I had thought that's no longer true:

> Before the Unicode Standard, Version 3.1, the problematic "non-shortest form" byte sequences in UTF-8 were those where BMP characters could be represented in more than one way. These sequences are ill-formed, because they are not allowed by Table 3-7.

The example in the spec specifically calls out "C0" as an invalid first byte in a sequence:

> The byte sequence C0 AF is ill-formed, because C0 is not well-formed in the "First Byte" column. 

Or, is C0 80 as a replacement for 00 a special case?