SQLite Forum

Reusing the result of user-defined function within the same query
Login

Reusing the result of user-defined function within the same query

(1) By anonymous on 2022-09-29 17:57:46 [link] [source]

So, I have registered my first user-defined function myfunc, and am able to use it in statements like so:

SELECT id FROM table WHERE myfunc(id) > 10 ORDER BY myfunc(id);

Works fine, but the problem is that this function is called more often than needed (N times for N rows in the WHERE clause + M times in the ORDER BY clause for M rows which passed the WHERE filter; whereas I would like it to be called only N times). Since this function might potentially be time-consuming, I've decided to make little optimization experiment by counting how many times the function was called and then rewriting the query using aliases:

SELECT id, myfunc(id) AS x FROM table WHERE x > 10 ORDER BY x;
To my surprise, the number of function calls didn't change, so I conclude that column aliase serves as an alias for a function itself, not for its return value.

Could there possibly be a workaround?

(2) By David Raymond (dvdraymond) on 2022-09-29 18:10:51 in reply to 1 [source]

Not an expert, but my immediate question is: Is your user defined function deterministic, and appropriately flagged as such?

(3) By SeverKetor on 2022-09-29 18:50:06 in reply to 1 [link] [source]

I thought this would be quite simple, but you're right. It really wants to recompute the function call for both the column and the WHERE clause, even when set to deterministic (though in that case, it only gets called twice for all the rows). However, I did find one workaround:

WITH cte AS MATERIALIZED (SELECT id, myfunc(id) AS x FROM table) SELECT id, x FROM cte WHERE x > 10 ORDER BY x

(4.1) By Mark Lawrence (mark) on 2022-09-30 07:21:54 edited from 4.0 in reply to 1 [link] [source]

If your function is deterministic you might consider creating an expression index:

CREATE INDEX idx ON table(myfunc(id));

(5) By anonymous on 2022-09-30 13:26:21 in reply to 1 [link] [source]

Thank you all for the replies. I didn't knew about deterministic functions, they are barely mentioned in the section on application-defined functions. My function myfunc is deterministic indeed, so I hoped that just marking it as such with SQLITE_DETERMINISTIC when registering could solve the issue, but it turns out it does not - the results from my queries are the same as when the function is not marked deterministic (then I don't quite really understand which optimizations this flag might affect).

@SeverKetor, thanks, this trick with CTE does seem to do what I need. I'll have to test it for large tables, and see if it may affect performance in some other way.

@mark, that also could be a solution, thanks. But in fact my function is more complex, as it accepts few more arguments which are non-column constant values (I have simplified here to get a minimal working example). So this will probably require creating indexes for all possible compibations of arguments.

(6) By Stephan Beal (stephan) on 2022-09-30 13:30:35 in reply to 5 [link] [source]

(then I don't quite really understand which optimizations this flag might affect)

The operative word there is "MIGHT". The docs never claim that marking it as deterministic will cause it to be called only once per statement for any given set of arguments. Client code must never rely on specific optimizations being made or not, as those may change with any given checkin made to sqlite.

The fact is that your function is called X times in SQL, so sqlite is free to call it up to X times. It's also free, via optimizations, to call it fewer than X times, but never more than X.

(7) By Keith Medcalf (kmedcalf) on 2022-09-30 13:54:54 in reply to 6 [link] [source]

A function which is deterministic may be moved out of a loop since the result is constant. In ordinary programming this means that:

b = 0
for i = 1 to 15
   b += function(45)
print(b)

becomes:

b = 0
_temp_ = function(45)
for i = 1 to 15
  b += _temp_

which can become:

b = function(45) * 15

Simlarly, a function declared as "deterministic" where the argument is a constant, produces a constant result. Hence the function need only be called once and the same result applies for each candidate row -- the computation can be moved out of the loop and into the prologue. SQLite does this. Or it did. Now it uses ONCE constructs because when the "move constant deterministic expressions into the prologue" was initially designed, functions where not allowed to fail. However, since function calls are now permitted to fail this optimization no longer works as intended.

However, where the function takes an argument that is an element of the candidate row, then the fuction must be called for each candidate (obviously).

Note that this also means that if the caldidate is selected, then the function must be called to generate the projection (what appears in the SELECT ... part). And if you are ordering by the function result (rather than a projected column) then the function must be executed yet again to generate that value.

Knowing that, you can construct your query in such a fashion as to reduce the number of times the function "guts" are executed.

SQLite3 does not perform this optimization at present. That is, if the expression function(x) appears in your source statement multiple times, it will be executed multiple times, despite the fact that x comes from the same underlying source (that is, that all the function(x) executions will return the same result).

The code generator does not track that it has already computed function(x) and can merely re-use the existing result. It only tracks this fact if x is a constant.