SQLite Forum

TEXT values can contain arbitrary binary data
Login
It appears possible to craft text values (values where typeof() returns 'text') that contain arbitrary binary data.

First, the encoding:

```
sqlite> PRAGMA encoding;
encoding
--------
UTF-8   
```

Let's create a dummy table:

```
sqlite> CREATE TABLE test_table (
   ...>   some_text text NOT NULL,
   ...>   CHECK(typeof(some_text) == 'text')
   ...> );
```

The `CHECK` constraint isn't necessary for the behavior I'm about to show, but I put it there to convince you that the table won't store binary data.

You can see that SQLite honors the `CHECK` constraint:

```
sqlite> INSERT INTO test_table VALUES (x'ff');
Error: CHECK constraint failed: typeof(some_text) == 'text'
```

Now for the funny part:

```
sqlite> INSERT INTO test_table VALUES (x'ff' || 'bork');
sqlite> 
```

That works. But what got inserted? (Since, after all, the blob on the LHS of that `||` could have been coerced.)

```
sqlite> SELECT typeof(some_text), some_text FROM test_table;
typeof(some_text)  some_text
-----------------  ---------
text               �bork    
```

That's not a literal replacement character, either; that's my terminal replacing the (malformed, to the terminal) raw bytes that made it all the way to it. If I inspect this value in a higher-level language,

```
row.get_raw(0) = Text([255, 104, 101, 108, 108, 111, 44, 32, 98, 48, 114, 107])
row.get::<_, String>(0) = Err(FromSqlConversionFailure(0, Text, Utf8Error { valid_up_to: 0, error_len: Some(1) }))
```
(Note: here I inserted a different string, so the ASCII part differs.) We can see that we do get a `Text` back (which is what the CLI says, too) but that it isn't UTF-8: it starts with a literal `0xff` byte.

First, https://www.sqlite.org/datatype3.html says,

> TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16LE).

Which I read to mean that text must be text, specifically, it must be valid Unicode, since it will be stored with one of the three listed encoding.

https://sqlite.org/lang_expr.html states,

> The `||` operator is "concatenate" - it joins together the two strings of its operands.

I think the operative bit here is the "strings of its operands", meaning it coverts its operands to strings. So the question, then, is how does BLOB get converted into a string? §4.2 of [Datatypes](https://www.sqlite.org/datatype3.html) seems to answer that:

> BLOB values are converted into TEXT values by simply interpreting the binary BLOB context as a text string in the current database encoding. 

While that sort of answers it, it's both very surprising behavior (text values aren't necessarily text!) and it seems to conflict with the earlier bit from the same page: how is a text value containing a `0xff` bytes stored in UTF-8? Is this intentional? (And if it is, I think the part that defines the `TEXT` type should be more explicit about it not actually representing text.)

(I stumbled on this mostly by accident, trying to coerce a `text` into a `blob`, to make an `INSERT` easier.)