SQLite Forum

Query Optimizer Changes Meaning of Query
Login
The query

~~~~
   SELECT x FROM natural_numbers ORDER BY random() LIMIT 50;
~~~~

Returns a table of 50 distinct random numbers.  The question then is,
if that query is really a subquery in a larger SQL statement, and the
result set is used multiple times within the outer query, is the
SQL engine compelled to rerun the subquery, or is it allowed to cache
the results of the first run and reuse them.

I contend that the query planner is free to do it either way.  It can
either cache the results of the first run and reuse them.  Or it can
rerun the query.  Or (if it wanted to) it could cache the results and
use them two or three times and then rerun the query and use those
results two or three times, then run the subquery again, and so forth.

In other words, the programmer cannot make any assumptions about whether
or not the results of a (non-correlated) subquery are cached and reused
or if the subquery is run multiple times.  The SQL engine is free to do
whatever it wants.  Any query that depends on whether or not subquery results
are cached returns unpredictable output.  It is akin to running
a query without an ORDER BY clause - the engine is free to return the
results in any order it wants.  Just because it returns the results in
the order you want today does not mean it will continue to do so
tomorrow.

Constructing a query like this that depends on whether or not subquery
results are cached is like writing a multiple-threaded program that
depends on the order in which the threads are executed.  The output can
vary from one run to the next.  Don't do that.

I checked with the PostgreSQL developers and am told that PostgreSQL
works the same way.  The equivalent query in PostgreSQL might cache
and reuse the subquery results, or it might rerun the subquery.  You
never know.

As currently implemented, SQLite only caches the subquery results if it
wants to build an automatic index.  But that decision might change tomorrow,
so you cannot depend on it.