ESCAPE '%'
(1) By anonymous on 2020-03-18 19:22:25 [link] [source]
Hi,
sqlite> CREATE TEMP TABLE Test (text TEXT);
sqlite> INSERT INTO Test VALUES ('100%');
sqlite> INSERT INTO Test VALUES ('100 percent');
sqlite> SELECT text FROM Test WHERE text LIKE '100%';
100%
100 percent
sqlite> SELECT text FROM Test WHERE text LIKE '100x%' ESCAPE 'x';
100%
So far, so good.
sqlite> SELECT text FROM Test WHERE text LIKE '100%%' ESCAPE '%';
100%
100 percent
Is this expected behavior? The docs say:
If the optional ESCAPE clause is present, then the expression following the ESCAPE keyword must evaluate to a string consisting of a single character. This character may be used in the LIKE pattern to include literal percent or underscore characters. The escape character followed by a percent symbol (%), underscore (_), or a second instance of the escape character itself matches a literal percent symbol, underscore, or a single escape character, respectively.
Can a percent character not be used as an escape character?
Thanks, Hamish
(2) By Simon Slavin (slavin) on 2020-03-18 22:49:56 in reply to 1 [link] [source]
Sorry, no. Both the percent symbol and the underscore have special meanings in this context. From the paragraph before the one you quoted:
A percent symbol ("%") in the LIKE pattern matches any sequence of zero or more characters in the string. An underscore ("_") in the LIKE pattern matches any single character in the string.
They cannot be used for something else. I hope you can use some other symbol for your escape character.
(3) By Simon Slavin (slavin) on 2020-03-18 22:55:20 in reply to 1 [link] [source]
Hmm. If you really need to do it with a percent character you could use REGEXP instead of LIKE and write your own REGEXP parser …
The REGEXP operator is a special syntax for the regexp() user function. No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message. If an application-defined SQL function named "regexp" is added at run-time, then the "X REGEXP Y" operator will be implemented as a call to "regexp(Y,X)".
… but that's a lot of work and I would probably choose a different solution.
(4) By Keith Medcalf (kmedcalf) on 2020-03-19 01:39:28 in reply to 3 [link] [source]
There is a regexp extension in ext/misc/regexp.c at https://www.sqlite.org/src/artifact/246244c714267f30 so no need to write, merely compile, load, and go ...
(5.1) By ddevienne on 2020-03-19 11:19:53 edited from 5.0 in reply to 1 [link] [source]
I'd consider that a bug myself, unlike Simon and Keith, because SQLite strives for PostgreSQL compatibility, and PostgreSQL gets this right:
[ddevienne@nemo ~]$ psql
Password for user ddevienne:
psql (11.2, server 12.0)
WARNING: psql major version 11, server major version 12.
Some psql features might not work.
Type "help" for help.
ddevienne=> create table t (v text);
CREATE TABLE
ddevienne=> insert into t values ('100%'), ('100 percent');
INSERT 0 2
ddevienne=> select v from t;
v
-------------
100%
100 percent
(2 rows)
ddevienne=> select v from t where v like '100%%' escape '%';
v
------
100%
(1 row)
I had a quick look at SQLite's impl (which as a reminder, just in case,
is nicely exposed as a public API, sqlite3_strlike()
), and this is because
that impl attempts to first compress the like-pattern to consecutive
matchAll
characters ('%'
here), before testing the matchOther
escape-character:
static int patternCompare(
const u8 *zPattern, /* The glob pattern */
const u8 *zString, /* The string to compare against the glob */
const struct compareInfo *pInfo, /* Information about how to do the compare */
u32 matchOther /* The escape char (LIKE) or '[' (GLOB) */
){
u32 c, c2; /* Next pattern and input string chars */
u32 matchOne = pInfo->matchOne; /* "?" or "_" */
u32 matchAll = pInfo->matchAll; /* "*" or "%" */
u8 noCase = pInfo->noCase; /* True if uppercase==lowercase */
const u8 *zEscaped = 0; /* One past the last escaped input char */
while( (c = Utf8Read(zPattern))!=0 ){
if( c==matchAll ){ /* Match "*" */
/* Skip over multiple "*" characters in the pattern. If there
** are also "?" characters, skip those as well, but consume a
** single character of the input string for each "?" skipped */
while( (c=Utf8Read(zPattern)) == matchAll || c == matchOne ){
The fact the doc never mentions the escape character cannot be '%'
puts this in the fuzzy unspecified category, but again, Richard usually
considers PostgreSQL as the reference, so I hope he'll fix this. --DD
(6) By Hamish Allan (hatfinch) on 2020-03-19 11:48:53 in reply to 2 [link] [source]
I’m just trying to construct a prefix query. It’s useful to be able to use the percent character as the escape character because then I can just replace %
with %%
and _
with %_
in the string I’m substituting, and simply add a %
at the end.
I can use a different character, but then I also have to escape that character as well as the percent and underscore.
No big deal, just struck me as odd because the latter paragraph is addressing how to escape the characters denoted as special by the former paragraph, but doesn’t mention any constraints on the escape character. I would read that as allowing what I’m trying to do, but as ddevienne says, it’s underspecified.
(7) By Richard Damon (RichardDamon) on 2020-03-19 13:18:15 in reply to 6 [link] [source]
The issue is that if % is the escape character, then there is no way to put an effective % in the string (except MAYBE the end), as %% is the literal (non-effective) %, and % any-other-character is that any-other-character. By many definitions of the escape character, escape end-of-string is an undefined sequence, as escape should be followed by the character it is escaping. We could perhaps define it to become an effective version of the character, but that still doesn't allow us to use the % as a match elsewhere.
(8) By Dan Kennedy (dan) on 2020-03-19 13:18:58 in reply to 5.1 [link] [source]
Interesting. I assumed it was disallowed because:
? LIKE '%%%' ESCAPE '%'
is ambiguous.
(9) By Richard Hipp (drh) on 2020-03-19 18:27:10 in reply to 1 [link] [source]
Behavior changed to match PostgreSQL on trunk.
(10) By Stephan Beal (stephan) on 2020-03-19 18:58:16 in reply to 9 [link] [source]
Behavior changed to match PostgreSQL on trunk.
A pedantic micro-optimization:
if( escape==pInfo->matchAll || escape==pInfo->matchOne ){
memcpy(&backupInfo, pInfo, sizeof(backupInfo));
pInfo = &backupInfo;
if( escape==pInfo->matchAll ) pInfo->matchAll = 0;
if( escape==pInfo->matchOne ) pInfo->matchOne = 0;
}
Presumably matchAll
and matchOne
may never be the same character, in which case those adjacent inner if
's could be changed to an if/else if
.
(11.1) Originally by doug (doug9forester) with edits by Richard Hipp (drh) on 2020-03-19 19:33:55 from 11.0 in reply to 9 [link] [source]
I don't know how do_execsql_test works. Is the {4} at the end the number of row returned? From Richard's update to the test suite: ``` # 2020-03-19 # The ESCAPE clause on LIKE takes precedence over wildcards # do_execsql_test idu-6.0 { DROP TABLE IF EXISTS t1; CREATE TABLE t1(id INTEGER PRIMARY KEY, x TEXT); INSERT INTO t1 VALUES (1,'abcde'), (2,'abc_'), (3,'abc__'), (4,'abc%'), (5,'abc%%'); SELECT id FROM t1 WHERE x LIKE 'abc%%' ESCAPE '%'; } {4} do_execsql_test icu-6.1 { SELECT id FROM t1 WHERE x LIKE 'abc__' ESCAPE '_'; } {2} ``` If {4} and {2} are the number of rows returned, shouldn't they both be {2}?
(12) By Keith Medcalf (kmedcalf) on 2020-03-19 20:53:19 in reply to 11.1 [link] [source]
The SQL statement is "SELECT id ..." so the {4} and {2} are the lists of id returned.
And there is (and should only be) one row returned by each query.
Why do you think the number of rows returned for each query should be two?
(13) By doug (doug9forester) on 2020-03-19 21:13:20 in reply to 12 [link] [source]
I assumed a wildcard was zero or more like the "*" wildcard in regular expressions.
(14) By TripeHound on 2020-03-20 01:38:13 in reply to 13 [link] [source]
But there aren't any wildcards... an escaped percent sign should just match the single character %
, and an escaped underscore should just match the single character _
.
(15) By doug (doug9forester) on 2020-03-20 02:03:35 in reply to 14 [link] [source]
What is the difference between these:
SELECT id FROM t1 WHERE x LIKE 'abc%%' ESCAPE '%';
and
SELECT id FROM t1 WHERE x LIKE 'abc%%';
and
SELECT id FROM t1 WHERE x LIKE 'abc%';
I'm pretty sure the last one is a wildcard search, that (I assume) matches anything like the regular expression "abc.*".
What match string expressed as a regular expression do the first two selects find?
(16) By Keith Medcalf (kmedcalf) on 2020-03-20 03:17:51 in reply to 15 [source]
LIKE 'abc%%' ESCAPE '%' -> REGEXP 'abc%'
LIKE 'abc%%' -> REGEXP 'abc.*.*'
LIKE 'abc%' -> REGEXP 'abc.*'
That is, the % wildcard is a regex .*
the _ wildcard is .
ESCAPE allows you to set the an alternate (escape) character -- the default is to have no escape character.
So the sequence '%%' means '.*.*'
unless the escape character is '%' in which case is is simply a match for the single character %
(17) By doug (doug9forester) on 2020-03-20 05:27:47 in reply to 16 [link] [source]
Thanks. That clears it up nicely.