SQLite Forum

FILTER clause with window functions
Login

FILTER clause with window functions

(1.1) By MBL (RoboManni) on 2021-06-21 07:36:40 edited from 1.0 [link] [source]

I formed an sql query where I want to calculate the difference between two time stamps of two different events and for that I need to use the filter clause for this one and last_value().

I request improvements for the window (aggregate?) function first_value() and the relaxed usage of the FILTER clause.

What needs to be done to make these two window functions become window aggregate functions, which can be used together with the filter clause? They are not listed as window scalar functions but also do not work like described as window aggregate functions.

The following column definition works fine and is doing what I expect it to do.

Working:

,case when Sent
      then csv(  group_concat(TStamp,'|') filter(where Client or QueueIndex) over ( rows between 500 preceding and 1 preceding) ,-1,'|')
      end as FromTime1

(My extension function csv(TEXT,-1) takes the last field from the concatenated string. group_concat() as a window aggregate function is working fine.

But the following should do the same much easier without having to create a long string.

Not working:

,case when Sent
      then last_value(TStamp) filter(where Client or QueueIndex) over ( rows between 500 preceding and 1 preceding )
      end as FromTime2

But I get following error:

1:FILTER clause may only be used with aggregate window functions

Questions:

  • Why are last_value() and first_value() not also window aggregate functions?
  • Why can they not be used together with the filter clause ?
  • Can the filter clause not be enabled for any window function?

To give some more background:

  • Within a with clause framed query I read a log file in text format line by line from one of my virtual table extensions.
  • Field extractions are done by use of some of my scalar extension functions.
  • Then the calculation happens on that table (MATERIALIZED or not does not make any difference in results)
  • I often do this type of text file analysis but it is the first time I try to do the calculatinos with the WINDOW clauses.

If the first_value() would accept the FILTER clause and if I search through the frame spec descendent then my most recent previous timestamp would be found fastest. The Window frame scan could be finished with the first occurance of a non-Null value which goes into the first_value() aggregate function instead of having to scan the whole frame, which in my case could be up to 500 rows earlier but is usually just 5 rows before the current row.

My next approach should be something like the following, where I also try to influence the window scan direction from CURRENT ROW backwards to the frame start and stop early each window frame scan with the first non-NULL filter match:

,case when Sent
      then TStamp - first_value(TStamp) filter(where Client or QueueIndex) over ( order by Line desc rows between 1 preceding and 500 preceding )
      end as Duration

Result example:

sqlite> select Line,Time,TStamp,Client,QueueIndex,Sent,Duration from ViewMessageSequence limit 45 offset 100;
┌──────┬─────────────────────┬────────────────┬───────────┬────────────┬─────────┬──────────┐
│ Line │        Time         │     TStamp     │  Client   │ QueueIndex │  Sent   │ Duration │
├──────┼─────────────────────┼────────────────┼───────────┼────────────┼─────────┼──────────┤
│ 719  │ 20210618.153515.666 │ 1624037715.666 │           │            │         │          │
│ 720  │ 20210618.153515.668 │ 1624037715.668 │           │            │         │          │
│ 723  │ 20210618.153515.668 │ 1624037715.668 │           │            │         │          │
│ 724  │ 20210618.153515.670 │ 1624037715.67  │           │            │         │          │
│ 725  │ 20210618.153515.670 │ 1624037715.67  │           │            │ 211/211 │ 0.004    │
│ 727  │ 20210618.153517.675 │ 1624037717.675 │ 127.0.0.1 │            │         │          │
│ 757  │ 20210618.153517.675 │ 1624037717.675 │           │            │         │          │
│ 758  │ 20210618.153517.676 │ 1624037717.676 │           │            │         │          │
│ 759  │ 20210618.153517.676 │ 1624037717.676 │           │            │         │          │
│ 760  │ 20210618.153517.676 │ 1624037717.676 │           │            │         │          │
│ 761  │ 20210618.153517.677 │ 1624037717.677 │           │            │         │          │
│ 765  │ 20210618.153517.677 │ 1624037717.677 │           │            │         │          │
│ 766  │ 20210618.153517.679 │ 1624037717.679 │           │            │         │          │
│ 767  │ 20210618.153517.679 │ 1624037717.679 │           │            │ 279/279 │ 0.004    │
│ 769  │ 20210618.153517.881 │ 1624037717.881 │ 127.0.0.1 │            │         │          │
│ 783  │ 20210618.153517.881 │ 1624037717.881 │           │            │         │          │
│ 784  │ 20210618.153517.881 │ 1624037717.881 │           │            │         │          │
│ 785  │ 20210618.153517.883 │ 1624037717.883 │           │            │         │          │
│ 787  │ 20210618.153517.883 │ 1624037717.883 │           │            │         │          │
│ 788  │ 20210618.153517.885 │ 1624037717.885 │           │            │         │          │
│ 789  │ 20210618.153517.885 │ 1624037717.885 │           │            │ 144/144 │ 0.004    │
│ 793  │ 20210618.153519.590 │ 1624037719.59  │           │ 125        │         │          │
│ 794  │ 20210618.153519.590 │ 1624037719.59  │           │            │         │          │
│ 800  │ 20210618.153519.590 │ 1624037719.59  │           │            │         │          │
│ 801  │ 20210618.153519.593 │ 1624037719.593 │           │            │         │          │
│ 802  │ 20210618.153519.593 │ 1624037719.593 │           │            │ 500/500 │ 0.003    │
│ 803  │ 20210618.153519.593 │ 1624037719.593 │           │ 126        │         │          │
│ 804  │ 20210618.153519.593 │ 1624037719.593 │           │            │         │          │
│ 806  │ 20210618.153519.593 │ 1624037719.593 │           │            │         │          │
│ 807  │ 20210618.153519.595 │ 1624037719.595 │           │            │         │          │
│ 808  │ 20210618.153519.595 │ 1624037719.595 │           │            │ 222/222 │ 0.002    │
│ 811  │ 20210618.153519.696 │ 1624037719.696 │           │ 127        │         │          │
│ 812  │ 20210618.153519.696 │ 1624037719.696 │           │            │         │          │
│ 818  │ 20210618.153519.696 │ 1624037719.696 │           │            │         │          │
│ 819  │ 20210618.153519.743 │ 1624037719.743 │           │            │         │          │
│ 820  │ 20210618.153519.744 │ 1624037719.744 │           │            │ 501/501 │ 0.048    │
│ 822  │ 20210618.153519.944 │ 1624037719.944 │ 127.0.0.1 │            │         │          │
│ 852  │ 20210618.153519.944 │ 1624037719.944 │           │            │         │          │
│ 853  │ 20210618.153519.944 │ 1624037719.944 │           │            │         │          │
│ 854  │ 20210618.153519.944 │ 1624037719.944 │           │            │         │          │
│ 855  │ 20210618.153519.944 │ 1624037719.944 │           │            │         │          │
│ 856  │ 20210618.153519.947 │ 1624037719.947 │           │            │         │          │
│ 860  │ 20210618.153519.947 │ 1624037719.947 │           │            │         │          │
│ 861  │ 20210618.153520.047 │ 1624037720.047 │           │            │         │          │
│ 862  │ 20210618.153520.047 │ 1624037720.047 │           │            │ 295/295 │ 0.103    │
└──────┴─────────────────────┴────────────────┴───────────┴────────────┴─────────┴──────────┘
sqlite>

(2) By Dan Kennedy (dan) on 2021-06-21 13:45:10 in reply to 1.1 [link] [source]

They're described here:

https://www.sqlite.org/windowfunctions.html#built_in_window_functions

Like the other "built-in" window functions, they don't support the FILTER clause. I think it's a standards thing. Postgres has the same restriction.

You could use the sqlite3_create_window_function() API to create versions of first_value() and last_value() that worked like regular aggregate window functions:

https://sqlite.org/c3ref/create_function.html

last_value() is quite trivial, and first_value() wouldn't be all that complicated either. A user implemented first_value() would use a little extra memory I suppose, but that might not be a problem in most cases.

Dan.

(3) By anonymous on 2021-07-15 13:46:27 in reply to 1.1 [source]

I'd like to add my support for RoboManni's request!

Whether this conforms to standards or not, I often find myself working with sparse data and wanting "the most recent non-null value from column X" (for tracking running subtotals, calculating differences to reference values etc.).

On oracle DB's I have used last_value( col ) over ( partition ... order ... rows between ... ignore nulls ) to good effect - which at least feels like filter ( where X is not null ) should be able to do the same.

Personally, I use SQLite as it is embedded in other tools - so I don't consider writing an extension a viable option.

Some time ago, Keith Medcalf replied to my query about this on the mailing lists with a solution along the lines of:

select
   this.tick,
   ( select prev.col from tbl prev where prev.tick <= this.tick and prev.col is not null order by timestamp desc ) as last_value
from tbl this
where ...

Kudos to Keith for finding a solution, but my experience wth this technique has been "not great", it's slow and I find it quickly leads to unintelligible queries, especially when multiple columns are involved :-(

Cheers and thanks to all the SQLite dev's - an absolutely awesome tool!

(4) By anonymous on 2023-02-14 16:48:54 in reply to 1.1 [link] [source]

I'd like to see the same feature. I'm trying to take the first non-null value of several columns in a window and first_value supporting a FILTER clause would be a massive help.

(6) By anonymous on 2023-03-20 08:59:29 in reply to 4 [link] [source]

Use array_agg, it really works! Then you can choose the position you want

(5) By anonymous on 2023-02-14 17:23:06 in reply to 1.1 [link] [source]

I (and possibly others) have also wanted it before, to make first_value, last_value, and nth_value as aggregate functions, so that a FILTER clause is possible and that they can be used as non-window aggregate functions.

However, additionally than that, I (and, again, also others) also wanted that it should allow ORDER BY in non-window aggregate functions too (using the syntax like PostgreSQL does), which allows these functions to be useful in such a case, too (since otherwise the order is unspecified, which makes these functions not very useful).