I had thought that's no longer true: > Before the Unicode Standard, Version 3.1, the problematic "non-shortest form" byte sequences in UTF-8 were those where BMP characters could be represented in more than one way. These sequences are ill-formed, because they are not allowed by Table 3-7. The example in the spec specifically calls out "C0" as an invalid first byte in a sequence: > The byte sequence C0 AF is ill-formed, because C0 is not well-formed in the "First Byte" column. Or, is C0 80 as a replacement for 00 a special case?