SQLite User Forum

The replace(X,Y,Z) function behaves inconsistently when the argument Y contains the NUL character
Login

The replace(X,Y,Z) function behaves inconsistently when the argument Y contains the NUL character

(1.2) By sycxyc on 2022-10-03 11:13:54 edited from 1.1 [link] [source]

Test SQL:

select
hex(replace(x,'0'||x'00',z))=hex('_1'),
hex(replace(x,x'00',z))=hex('0_1'),
hex(replace(x,x'00'||'1',z))=hex('0_')
from
(select '0'||x'00'||'1' as x,'_' as z) t;

-- Or cast all arguments to BLOB.
select
hex(replace(x,cast('0'||x'00' as BLOB),z))=hex('_1'),
hex(replace(x,cast(x'00' as BLOB),z))=hex('0_1'),
hex(replace(x,cast(x'00'||'1' as BLOB),z))=hex('0_')
from
(select cast('0'||x'00'||'1' as BLOB) as x, cast('_' as BLOB) as z) t;

Expected result:

1|1|1

Actual result:

1|0|0

SQLite version:

3.39.4

Additional information:

The problem may be caused by this line func.c#L1270.

This patch fixes this issue:

diff --git a/src/func.c b/src/func.c
index ec50a17a3..3c50c9055 100644
--- a/src/func.c
+++ b/src/func.c
@@ -1267,12 +1267,12 @@ static void replaceFunc(
             || sqlite3_context_db_handle(context)->mallocFailed );
     return;
   }
-  if( zPattern[0]==0 ){
+  nPattern = sqlite3_value_bytes(argv[1]);
+  if( nPattern==0 ){
     assert( sqlite3_value_type(argv[1])!=SQLITE_NULL );
     sqlite3_result_value(context, argv[0]);
     return;
   }
-  nPattern = sqlite3_value_bytes(argv[1]);
   assert( zPattern==sqlite3_value_text(argv[1]) );  /* No encoding change */
   zRep = sqlite3_value_text(argv[2]);
   if( zRep==0 ) return;

(2) By anonymous on 2022-10-01 14:25:43 in reply to 1.0 [link] [source]

Suggested documentation improvement: NUL Characters in Strings should mention that several other functions also treat a NUL as the end of the string.

(3) By Keith Medcalf (kmedcalf) on 2022-10-01 15:06:06 in reply to 2 [link] [source]

Seems somewhat interesting, does it not, that the definition of a NULL terminated string is a sequence of non-NULL characters followed by a NULL character (where the length of a character is arbitrary).

Given that definition of a NUL terminated string, by what defect of congnition do you believe that you can put a NULL character inside a NULL terminated string, without teminating the string?

(4) By SeverKetor on 2022-10-01 15:22:45 in reply to 3 [link] [source]

Local man discovers not everyone assumes strings are null-terminated.

Not every language does that, but they still have SQLite support. People not used to dealing with null-terminated strings are going to exist, like it or not.

(7.1) By Keith Medcalf (kmedcalf) on 2022-10-02 14:29:55 edited from 7.0 in reply to 4 [link] [source]

The very definition of a C-String is that it is a "sequence of non-zero characters followed by a zero character".

A Pascal string, on the other hand, consists of an arbitrary bag-of-bytes of a specified length.

You simply need to be clear whether you are dealing with C Strings or Pascal Strings.

SQLite3 uses C Strings (so sayeth the documentation). This necessarily implies that an embedded null is not embedded (since that is impossible by defintion) and is therefore the string terminator.

One may decide that "inside" the library all strings are Pascal strings, and work with them that way, or, one may decide that "inside" the library all strings are C strings, and work with them that way.

SQLite3 uses NULL terminated strings internally. Since every string in SQLite3 has a specified length (stored string that is, not ephemeral strings), then it could implement Pascal string handling semantics internally. However, this would have to be done consistently and everywhere. If there is any one "spot" left-over that does not process everything using Pascal-style strings, then the whole re-writing process would be a massive waste of time.

(5.1) By sycxyc on 2022-10-02 05:23:25 edited from 5.0 in reply to 2 [link] [source]

Improving that documentation does not solve the problem of inconsistent behavior.

In fact, it works fine as long as the first character of the Y argument is not NUL, as the initial test case should have shown.

According to the Built-In Scalar SQL Functions documentation, the replace(X,Y,Z) function actually uses memcmp(), which can handle binary bytes. I speculate that when the Y argument starts with NUL, it is incorrectly treated as an empty string.

(6) By Sunny Saini (SunnySaini_com) on 2022-10-02 10:58:07 in reply to 5.1 [link] [source]

Interesting

(8) By Gunter Hick (gunter_hick) on 2022-10-03 06:16:50 in reply to 1.1 [link] [source]

Consider: 

"... the result of expressions operating on strings with embedded NULs is undefined."

See the sqlite3_result_text() function description in  https://sqlite.org/c3ref/result_blob.html

(9) By sycxyc on 2022-10-03 11:11:12 in reply to 8 [link] [source]

The branch of logic causing the problem actually uses sqlite3_result_value() without going through sqlite3_result_text().

select typeof(replace(1,x'00','')),typeof(replace('1',x'00',''));

-- integer|text

(10) By Gunter Hick (gunter_hick) on 2022-10-03 11:30:39 in reply to 9 [source]

That is beacause the problematic value is being passed to sqlite3_bind_text(). The description thereof has the same warning, see https://sqlite.org/c3ref/bind_blob.html

"If any NUL characters occurs at byte offsets less than the value of the fourth parameter then the resulting string value will contain embedded NULs. The result of expressions involving strings with embedded NULs is undefined."

(11) By sycxyc on 2022-10-03 12:19:01 in reply to 10 [link] [source]

Sorry I don't know much about SQLite C Interface, but this document seems to be for "Binding Values To Prepared Statements", if I use the prepared statement, I might replace x'00' with char(0). I think the problem is not here.

(12) By Gunter Hick (gunter_hick) on 2022-10-04 06:14:47 in reply to 11 [link] [source]

The problem is that you are creating strings with embedded NUL characters that are documented to deliver undefined results in string operations. Note that "undefined results" means that ANY result is the correct result, even if that result changes between releases or between executions.

SQLite is designed to make it easy to create functions (or even replace built in functions). If your application requires embedded-NUL safe string functions, you can always write them yourself.

(13) By sycxyc on 2022-10-05 14:13:04 in reply to 12 [link] [source]

My previous plan was to use only sqlcipher CLI to handle strings with NUL mixed in, I think Removing NUL Characters From A Text Field replace(b,char(0),'') is more useful than substr(b,1,instr(b,char(0))), but this problem just makes it almost impossible to be simple.

I think the robustness of the replace() function deserves improvement, and the patch should be relatively simple.

(14) By Gunter Hick (gunter_hick) on 2022-10-05 15:10:35 in reply to 13 [link] [source]

Sanitizing input data should occur before it hits the database. In this case, it seems strings with embedded NULs were missed.

SQL is not particularly well suited for performing string operations. The suggested query will truncate strings at the first embedded NUL, which conforms to expected behaviour of NUL terminated strings.

Keeping the tail of the string is more complicated, and may fail if the string contains more than one NUL character, because - drum roll - operations on strings with embedded NUL characters are undefined.

If you really need to sanitize strings with embedded NULs, you are better off writing your own custom function to do so.