SQLite Forum

Hidden noCase bug in ext/misc/regexp.c in pRe->zInit prefix optimization

Hidden noCase bug in ext/misc/regexp.c in pRe->zInit prefix optimization

(1.2) By andse-t610 on 2021-06-03 12:16:20 edited from 1.1 [link]

Hello! I'm trying to modify `regexp.c` extension for my application. So I added 3-args `regexp(re, text, noCase)` function.

I'm noticed, that queries like  

> select regexp('abc', 'ABC', 1);

> select regexp('ABC', 'ABC', 1)

doesn't match.

But `select regexp('ABC', 'abc', 1)` matches.

I found out that it is because of the prefix optimiation in re_compile in lines [668-691](https://www.sqlite.org/cgi/src/file?ci=trunk&name=ext/misc/regexp.c&ln=668-692). 

Upper, when calling [re_subcompile_re at 651](https://www.sqlite.org/cgi/src/file?ci=trunk&name=ext/misc/regexp.c&ln=651), the text of the regular expression is written in lowercase, so the prefix is stored in lowercase.

Then in re_match at line [217](https://www.sqlite.org/cgi/src/file?ci=trunk&name=ext/misc/regexp.c&ln=217) the prefix is compared with text *case-sensitive*.

The simplest way to fix it - disable prefix optimization in noCase mode - modify line [676](https://www.sqlite.org/cgi/src/file?ci=trunk&name=ext/misc/regexp.c&ln=676) from

> if( pRe->aOp[0]==RE_OP_ANYSTAR ){


> if( !noCase && pRe->aOp[0]==RE_OP_ANYSTAR ){


(2) By andse-t610 on 2021-06-03 12:15:53 in reply to 1.1 [link]

Another approach - modify lines [216-217](https://www.sqlite.org/cgi/src/file?ci=trunk&name=ext/misc/regexp.c&ln=216-217) in `re_match`.

But there are three problems:

* first char optimization `zIn[in.i]!=x` is case sensitive always. Simplest- disable this optimization in _noCase mode_
* `strcncmp` is case sensitive - we can use `sqlite3_strnicmp` in _noCase mode_
* (!) how to determine that current regexp compiled with noCase flag?

(3) By Richard Hipp (drh) on 2021-06-03 13:58:34 in reply to 1.2 [link]

This bug, plus one other, now fixed on [trunk][1].  I also added a new
SQL function "regexpi(PATTERN,STRING)" that does case-independent REGEXP
matching.  And the regexp extension has been added to the [CLI][2].

[1]: https://www.sqlite.org/src/timeline?c=5d4535bfb603d7c8
[2]: https://www.sqlite.org/cli.html

(4) By andse-t610 on 2021-06-04 10:39:08 in reply to 3 [link]

Thank you for your lightning response! With such developers, sqlite is not in danger)

A question along the way - is there any reason not to use flag `SQLITE_DETERMINISTIC` when declaring functions `regexp` and `regexpi`?

what bad (or good or nothing) things can happen if I use it?

I have found that without this flag i can't use `regexp` in the partial index (as  the [docs](https://www.sqlite.org/c3ref/c_deterministic.html#sqlitedeterministic) says).

(5) By andse-t610 on 2021-06-04 11:06:22 in reply to 3

The function `regexpi` should be mentioned in the docs in lines [10-11](https://sqlite.org/src/file?name=ext/misc/regexp.c&ln=17-18)

(6) By Richard Hipp (drh) on 2021-06-04 11:46:51 in reply to 4 [link]

A problem with using the SQLITE_DETERMINISTIC flag on REGEXP and thus permitting
its use in the schema, for example in the WHERE clause of a partial index, is
that different installations might easily use different definitions of REGEXP.
The commonly used ICU extension defines a slightly different REGEXP operator,
for example.  And I think versions of REGEXP based on Perl regular expressions
are in wide circulation.  If you create an index using the built-in REGEXP and
then some other application tries to use it using a different definition of
REGEXP, then the index might not work right.

(7) By andse-t610 on 2021-06-04 12:49:55 in reply to 6 [link]

Clearly. I think we'll get the same problems if we change the collation logic on the existing database (like [here](https://sqlite.org/forum/forumpost/d1db925a1b?t=h))

All clear. Thanks a lot!