Bug Report and Patch: LIKE operator mishandles U+0080
(1) By Richard Li (chys00) on 2022-10-14 12:00:54 [source]
Hello, it appears I have found a bug with LIKE operator.
HOW TO REPROUCE
- SQL:
select cast(X'aa80' as text) like ('%' || cast(X'c280' as text));
- Actual result:
1
- Expected result:
0
WHY
The LIKE operator is implemeneted in function patternCompare
in func.c
.
After decoding X'c280' to U+0080, patternCompare
mistakenly believes U+0080 encodes to a single byte in UTF-8, and simply searches for the byte '\x80'
in the target string. As a result, '\xc2\x80'
in the pattern (if it immediately follows a %
) matches any multi-byte character ending with byte '\x80'
in the target string.
PROPOSED PATCH
I propose the following patch:
diff --git a/src/func.c b/src/func.c
index ec50a17a3..06581cbe3 100644
--- a/src/func.c
+++ b/src/func.c
@@ -740,7 +740,7 @@ static int patternCompare(
** c but in the other case and search the input string for either
** c or cx.
*/
- if( c<=0x80 ){
+ if( c<0x80 ){
char zStop[3];
int bMatch;
if( noCase ){
Thanks.
(2) By Dan Kennedy (dan) on 2022-10-14 15:12:33 in reply to 1 [link] [source]
Thanks for debugging and reporting this. Should now be fixed here:
https://sqlite.org/src/info/2da677c45b643482
Dan.