SQLite Forum

Bug Report and Patch: LIKE operator mishandles U+0080
Login

Bug Report and Patch: LIKE operator mishandles U+0080

(1) By Richard Li (chys00) on 2022-10-14 12:00:54 [source]

Hello, it appears I have found a bug with LIKE operator.

HOW TO REPROUCE

  • SQL: select cast(X'aa80' as text) like ('%' || cast(X'c280' as text));
  • Actual result: 1
  • Expected result: 0

WHY

The LIKE operator is implemeneted in function patternCompare in func.c.

After decoding X'c280' to U+0080, patternCompare mistakenly believes U+0080 encodes to a single byte in UTF-8, and simply searches for the byte '\x80' in the target string. As a result, '\xc2\x80' in the pattern (if it immediately follows a %) matches any multi-byte character ending with byte '\x80' in the target string.

PROPOSED PATCH

I propose the following patch:

diff --git a/src/func.c b/src/func.c
index ec50a17a3..06581cbe3 100644
--- a/src/func.c
+++ b/src/func.c
@@ -740,7 +740,7 @@ static int patternCompare(
       ** c but in the other case and search the input string for either
       ** c or cx.
       */
-      if( c<=0x80 ){
+      if( c<0x80 ){
         char zStop[3];
         int bMatch;
         if( noCase ){

Thanks.

(2) By Dan Kennedy (dan) on 2022-10-14 15:12:33 in reply to 1 [link] [source]

Thanks for debugging and reporting this. Should now be fixed here:

https://sqlite.org/src/info/2da677c45b643482

Dan.