SQLite User Forum

How do I use sqlite to use index’s when the query has several AND OR operations or CASE tests?
Login

How do I use sqlite to use index's when the query has several AND OR operations or CASE tests?

(1) By anonymous on 2024-11-19 04:24:23 [link] [source]

I am trying to use sqlpage to make little front end search engine for a 1tb+ database I have. see https://github.com/sqlpage/SQLPage/discussions/699 for more info

I am pretty new to sql and sqlite so appologies if I miss anything obvious. but say I have db with following schema:

CREATE TABLE Table1(field1 TEXT, field2 TEXT, field3 TEXT);
CREATE INDEX "testindexf1" ON "Table1" (field1);
CREATE INDEX "testindexf2" ON "Table1" (field2);
CREATE INDEX "testindexf3" ON "Table1" (field3);
CREATE INDEX "testindexf123" ON "Table1" (field1,field2,field3);

How do I setup the sql query such that, if I use sqlpage (or a bash script or something) such that the variable $field1 has a value, but feild2 and field3 are empty, then the query will be:

SELECT * FROM Table1 WHERE field1 = $field1;

and if field 2 and 3 is filled out and feild 1 is empty then the query is akin to:

SELECT * FROM Table1 WHERE field2 = $field2 AND field3 = $field3;

and so on for every possible combination of the form being filled out.

I've tried using case tests and and or operations, but they result in queries that dont use the index's I've created, which they need to do for the search to be usefull.

Some examples of queries I have tried in a bash script to check if the indexs are being used:

field1='1'
field2='2'
field3='3'

sqlite3  tst.db<<EOS
EXPLAIN QUERY PLAN SELECT COUNT(*) from table1
WHERE
	CASE WHEN COALESCE("$field1",'') <> '' THEN "field1" = "$field1" END AND
	CASE WHEN COALESCE("$field2",'') <> '' THEN "field2" = "$field2" END AND
	CASE WHEN COALESCE("$field3",'') <> '' THEN "field3" = "$field3" END;

EXPLAIN QUERY PLAN SELECT COUNT(*) from table1
WHERE
	(field1 GLOB '*' OR field1 = "$field1") AND
	(field2 GLOB '*' OR field2 = "$field2") AND
	(field3 GLOB '*' OR field3 = "$field3");
	
EXPLAIN QUERY PLAN SELECT COUNT(*) from table1
WHERE
	($field1 = '' OR field1 = "$field1") AND
	($field2 = '' OR field2 = "$field2") AND
	($field3 = '' OR field3 = "$field3");

EOS


they all result in --SCAN TABLE table1 instead of using the indexs.

The search page will have 6 feilds over all and will probably have more added, I would like to avoid hard coding every possible permutation of the query based on which feilds are filled out and which are not.

(2) By Gunter Hick (gunter_hick) on 2024-11-19 06:52:54 in reply to 1 [link] [source]

It is probably faster to build your query external to SQLite, based on the entered input fields of the form.

Also, only the AND equality conditions that are a prefix of the index expression can ever be considered by the NGQP. None of your queries conform.

Going from most to least fields, the constraints and required indices might be:

f1=v1 AND f2=v2 AND f3=v3 --> (f1,f2,f3)
f1=v1 AND f2=v2           --> (f1,f2,f3)
f1=v1           AND f3=v3 --> (f1,f3)
          f2=v2 AND f3=v3 --> (f2,f3)
f1=v1                     --> (f1,f2,f3) or (f1,f3)
          f2=v3           --> (f2,f3)
                    f3=v3 --> (f3)

Alternatively you could consider covering indices:

(f1,f2,f3)
(f2,f3,f1)
(f3,f1,f2)

so that combinations of fields not matching an index expression still do not require going to the main table.

May ba also try just plain EXPLAIN to display the generated bytecode

(5) By anonymous on 2024-11-19 15:16:36 in reply to 2 [link] [source]

Most of this is flying over my head (I think I know what you mean by "index expression" but whats NGQP? and what do you mean by covering indices?),

I dont think I have the time or inclination to dig to deep into the subtleties of all this, I'm just going to wright out all 64 possible queries and then a test that assigns a value 1-64 based on fields filled out. its taken me 30 mintues to get more then halfway done with writing out the queires by hand and ive spent 10s of hours tinkering and reading documentation just to get to those CASE tests and AND OR statments.

I appreciate the responce though, I've saved it to my notes for future reference if I decide to try and setup something to build the query externally or.. sopmething.

(6) By J. King (jking) on 2024-11-19 15:35:17 in reply to 5 [link] [source]

I think I know what you mean by "index expression" but whats NGQP? and what do you mean by covering indices?

"NGQP" here is jargon for "next-generation query planner". Said planner has been a part of SQLite for over a decade now, so it's really just the query planner, the algorithm which decides (really, makes an educated guess) about the fastest way to find the information you asked for. High-level documentation about how it works is available.

Said documentation also describes covering indices, which are indices which contains the information sought by a query. This avoids a table lookup altogether.

(10) By Gunter Hick (gunter_hick) on 2024-11-19 23:04:51 in reply to 5 [link] [source]

There is no need to store 64 queries. Just either add "<field> = <value>" to the where clause if the field is entered. And remember to use WHERE the first time and AND for subsequent expressions. That only requires simple string operations.

(16) By Trudge on 2024-11-20 16:47:25 in reply to 10 [link] [source]

I was in a similar situation and found this excellent solution here https://sqlite.org/forum/forumpost/90ec759008

select title from books join "Keywords" as kw1 using(authorid,titleid) join "Keywords" as kw2 using(authorid,titleid) join "Keywords" as kw3 using(authorid,titleid) join "Keywords" as kw4 using (authorid,titleid) join "Keywords" as kw5 using (authorid,titleid) join "Keywords" as kw6 using (authorid,titleid) join "Keywords" as kw7 using (authorid,titleid) join "Keywords" as kw8 using (authorid,titleid) where kw1.keyword='Philosophy' and kw2.keyword='Art' and kw3.keyword='History' and kw4.keyword='Politics' and kw5.keyword='Psychology' and kw7.keyword='Culture' and kw8.keyword='Climate Change' and kw6.keyword='Science';

title

Stranger Than We Can Imagine

(7) By punkish on 2024-11-19 16:43:14 in reply to 2 [link] [source]

I think Gunter's post is really wonderful. I face a similar problem as the OP in this thread -- I have about a dozen tables with about a dozen columns each, and I'd like to be able to (well, I'd like the user to be able to) query any combination of these columns and tables. I have written an elaborate program (in nodejs/JavaScript) to take the URL params (user queries come in as URL query params) and convert them into SQL. I've been using covering indexes in some cases. Right now my program only supports AND queries. At some point, I'd like to expand it to enable OR queries as well. Deconstructing the entire thought process like Gunter has done above will be really useful.

Many thanks for this.

(8) By lovasoa on 2024-11-19 22:35:49 in reply to 2 [link] [source]

Hello everyone,

This is Ophir from SQLPage. I believe the issue originally raised can be simplified and illustrated more clearly with the following SQL:

sqlite3 test.db
SQLite version 3.46.1 2024-08-13 09:16:08
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE test_table (field3 TEXT);
CREATE INDEX idx_field3 ON test_table (field3);

sqlite> explain query plan select * from test_table where field3 = 'aaaaaaaaaa';
QUERY PLAN
`--SEARCH test_table USING INDEX idx_field3 (field3=?)

sqlite> explain query plan select * from test_table where '' = 'aaaaaaaaaa' OR field3 = 'aaaaaaaaaa';
QUERY PLAN
`--SCAN test_table

The problem seems to be that the SQLite query planner does not evaluate the constant expression '' = 'aaaaaaaaaa' first, which would allow for the use of the index on field3. PostgreSQL handles this more effectively by evaluating constant expressions first, thus avoiding unnecessary table scans.

I believe a potential solution could involve enhancing the query planner to move constant evaluations out of the main execution loop.

Here’s how the execution bytecode looks for this query:

sqlite> explain select * from test_table where '' = 'aaaaaaaaaa' OR field3 = 'aaaaaaaaaa';
addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     12    0                    0   Start at 12
1     OpenRead       0     2     0     3              0   root=2 iDb=0; test_table
2     Rewind         0     11    0                    0   
3       Eq             2     6     1                    64  if r[1]==r[2] goto 6
4       Column         0     2     3                    0   r[3]= cursor 0 column 2
5       Ne             2     10    3     BINARY-8       82  if r[3]!=r[2] goto 10
6       Column         0     0     4                    0   r[4]= cursor 0 column 0
7       Column         0     1     5                    0   r[5]= cursor 0 column 1
8       Column         0     2     6                    0   r[6]= cursor 0 column 2
9       ResultRow      4     3     0                    0   output=r[4..6]
10    Next           0     3     0                    1   
11    Halt           0     0     0                    0   
12    Transaction    0     0     4     0              1   usesStmtJournal=0
13    String8        0     1     0                    0   r[1]=''
14    String8        0     2     0     aaaaaaaaaa     0   r[2]='aaaaaaaaaa'
15    Goto           0     1     0                    0   

Is this behavior well-known? If not, what’s the recommended way to report it for further review?

(9) By Gunter Hick (gunter_hick) on 2024-11-19 22:59:04 in reply to 8 [link] [source]

The Lite in SQLite prevents implementing complex algebra solving routines that would be able to determine that const1 == const2 is (in most cases) FALSE and that FALSE OR <expr> is equivalent to <expr>, whereas TRUE OR <expr> is equivalent to TRUE, and thus be able to eliminate either the LHS or the RHS expression. There being no server, such a determination would have to be made each and every time the query is compiled, with no way to amortise the cost thereof.

(12) By lovasoa on 2024-11-20 09:17:35 in reply to 9 [link] [source]

I don't think there are many cases in which the increased query compilation cost would be larger than the reduction in query execution time, except maybe for huge queries on tiny datasets.

The problem here is not really to "determine that const1 = const2 is false". In the original poster's example, the expression to analyze is "const1 = StatementParameter". What the optimizer could do is to compile it to an execution plan that first checks the expression's value, then runs either a table scan or an index search.

Even though this would almost always lead to performance improvements, even for one-off queries, I do understand that this is clearly not trivial to implement. I would still love to get the opinion of one of the SQLite developers, though.

(3) By Adrian Ho (lexfiend) on 2024-11-19 07:27:51 in reply to 1 [link] [source]

The search page will have 6 feilds over all and will probably have more added

All the more reason to generate a custom query with only the search conditions specified. That's far easier to maintain and debug than a generic monster query referencing every column.

If sqlpage doesn't let you do the sensible thing in this situation, it's not the right tool for your job.

(4) By anonymous on 2024-11-19 15:09:39 in reply to 3 [link] [source]

The dev of sqlpage has said that:

SQLPage itself currently doesn't have an option to run arbitrary sql defined by the end user. I am reluctant to adding something like that, since this feature has a great potential for misuse and would probably most of the time result in vulnerabilities in the application.

I am not entirely sure if that applies to my case applies (IE, I dont think what I want is to run "arbitrary" sql defined by end user) but since there is a very large number of possible combinations I'm not sure what else to call it.

tbh a generic monster query would be better then a hoard of 64+ individual queries and a series of tests to determin which to run, but I think I've reached the point where I've realised that just writing a test that gives me a number between 1-64 and then runs one of the 64 individual queries depending on the result of the test. Its annoying but I've gotten 40 written in less time then it took to write these two help requests, never mind all the head banging it took to get there. Writing all that stuff out manually will be faster then trying to learn whatever it is I would need to learn to make it smaller and easier to maintain. I will take a beast or a hoard that at the end of the day does what I want over a perfect machine that does nothing.

(11) By Adrian Ho (lexfiend) on 2024-11-20 01:28:41 in reply to 4 [link] [source]

I dont think what I want is to run "arbitrary" sql defined by end user

Why not? It's a pretty easy process in any backend language. Here's a simple example in bash, of all things:

#!/usr/bin/env bash
wheres=() params=()
[[ -n $1 ]] && wheres+=("field1 = ?") && params+=("$1")
[[ -n $2 ]] && wheres+=("field2 = ?") && params+=("$2")
[[ -n $3 ]] && wheres+=("field3 = ?") && params+=("$3")
[[ -n $4 ]] && wheres+=("field4 = ?") && params+=("$4")
[[ -n $5 ]] && wheres+=("field5 = ?") && params+=("$5")
[[ -n $6 ]] && wheres+=("field6 = ?") && params+=("$6")
IFS="#"; echo "Query: SELECT * FROM mytable WHERE ${wheres[*]}" | sed 's/#/ AND /g'
echo "Parameters to bind: $(printf "'%q' " ${params[@]})"

$ bash test-query.sh "" This is "" test
Query: SELECT * FROM mytable WHERE field2 = ? AND field3 = ? AND field5 = ?
Parameters to bind: 'This' 'is' 'test'

The key to doing this securely is to use the parameter substitution feature in your chosen SQLite language binding.

(13) By lovasoa on 2024-11-20 09:34:23 in reply to 1 [source]

For anyone who would stumble on this and just look for a quick solution (that does not involve concatenating strings of SQL code, and compiling 2^n variants of the same query).

A workaround for the default slow execution of the query is to add a condition to the WHERE clause, that will trigger the query planner's MULTI-INDEX OR optimization:

SELECT * FROM test_table
WHERE
  (f1=$f1 OR f2=$f2 OR f3=$f3) AND
  ($f1 = '' OR f1=$f1) AND
  ($f2 = '' OR f2=$f2) AND
  ($f3 = '' OR f3=$f3);

The two queries are equivalent if we do not have $f1 = $f2 = $f3 = '', which seems to be the intent in the original query (since the table seems to be too large to realistically return all rows anyways).

(14) By Gunter Hick (gunter_hick) on 2024-11-20 14:09:30 in reply to 13 [link] [source]

Since you are using query paramters, I suggest using $f<n> IS NULL, as unbound parameters are NULL by default and can easily be reset by calling sqlite3_clear_bindings() on the prepared statement.

(15) By lovasoa on 2024-11-20 16:40:32 in reply to 14 [link] [source]

Yes, that would be best in the general case !

In the original question, though, the bound values come from form fields filled by an user, which are empty strings when the user did not enter anything.