max() function broken

(1) By Sunny Saini (SunnySaini_com) on 2022-07-16 13:21:02 [source]

For table

CREATE TABLE Dalstock
( id integer primary key
, IST text
, Name text
, stock real
);

and values:

1, '2022-07-04', 'Arhar', 5
2, '2022-07-04', 'Arhar', 7
3, '2022-07-05', 'Arhar', 8

What is the result of query:

select max(b.id) id, b.ist, b.name, b.stock
from
dalstock a left join dalstock b
on a.name = b.name
  and a.ist > b.ist
group by a.name
having max(b.ist)
;

why?


Answer:

2, '2022-07-04', 'Arhar', 5

This may be due to SQLite bug. It selects id and other parts from different rows.


SQLite here join 2 rows to give incorrect information.

SQLite here should throw error "misuse of aggregate function".



WORKAROUND:

Never use max() / min() functions in main Select when group by having clause uses it.

Correction:

select b.*
from
dalstock a left join dalstock b
on a.name = b.name
  and a.ist > b.ist
group by a.name
having max(b.ist) and max(b.id)
;

Result:

2, '2022-07-04', 'Arhar', 7


Tested OK in SQLite v3.22.0 (M30s phone)
2022-07-04 16:59:37
++++

Bug also exist as Tested in SQLite v3.31.0 (Database Tools Ahmad for Android)
2022-07-08 08:34:02
++++

Tested OK in SQLite v3.35.4 for Windows. Bug exists.

(2) By Kees Nuyt (knu) on 2022-07-16 14:35:01 in reply to 1 [link] [source]

This ambiguity is documented.

The ability to include bare columns in a query is an SQLite-specific extension that many users asked for; a feature for the simplest use cases, frowned upon by many others, and discussed to death multiple times. It will not change.

(5) By David Raymond (dvdraymond) on 2022-07-18 12:21:08 in reply to 2 [link] [source]

The problem is that the result is not following that.

When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum.

If you look at the query, these are the rows you get before the grouping. a.* and b.*

sqlite> select * from dalstock as a left outer join dalstock as b on a.name = b.name and a.ist > b.ist;
id  IST         Name   stock  id  IST         Name   stock
--  ----------  -----  -----  --  ----------  -----  -----
1   2022-07-04  Arhar  5.0
2   2022-07-04  Arhar  7.0
3   2022-07-05  Arhar  8.0    1   2022-07-04  Arhar  5.0
3   2022-07-05  Arhar  8.0    2   2022-07-04  Arhar  7.0

a.name is the same, so all 4 rows go into 1 group.

having max(b.ist) is non-zero, so the group of these 4 rows gets processed.

There is one single aggregate, a max() in the select fields

select max(b.id) as id, b.ist, b.name, b.stock from...

So you would naturally expect the bare fields to come from the record with the maximum b.id... which is the final row shown with b.stock equal to 7.

But when you run the query, it's getting the bare rows from the row with stock = 5. Which is NOT the row that satisfies max(b.id)

It's "probably" getting them from the row used for max(b.ist) in the HAVING clause. But that situation isn't mentioned or covered in the documentation for that quirk. Everything in there suggests that, well, I've got a single max() in my select, so the bare columns will come from there.

(6) By Sunny Saini (SunnySaini_com) on 2022-07-21 11:08:07 in reply to 5 [link] [source]

You are right. That's why I want SQLite to not to process such queries and display error, instead.

(7) By Ryan Smith (cuz) on 2022-07-21 11:32:44 in reply to 6 [link] [source]

Why? It's not an error, there are millions of queries around the World where people are perfectly happy with the same output, it's just not the output you want in your case, but it most certainly is not an error.

Make the query with another SQL engine (postgres/MSSQL/MariaDB) that gives the result you want for the query you propose, post it here. We will show the inerrant way of achieving the same using SQLite.

(8) By Sunny Saini (SunnySaini_com) on 2022-07-22 12:01:17 in reply to 7 [link] [source]

It appears that you haven't gone through all posts in this thread.

Generally max()/ min() aggregate function choses one of the rows as described in the documentation. But, here it is creating incorrect result by selecting different columns from two different rows.

Refer to explanation given by [ (5) By David Raymond (dvdraymond) on 2022-07-18 12:21:08 .... ] thoroughly and you will understand what I mean.

(9) By Richard Hipp (drh) on 2022-07-22 12:27:24 in reply to 8 [link] [source]

Your query contains 2 max() functions:

max(b.id)
max(b.ist)

The bare columns in the result set take on values where one or the other of these two functions are at their maximum. You don't know which one. As it happens, SQLite is selecting max(b.ist), this time. That might change tomorrow - it is not something you can count on. SQLite is free to choose whichever one it wants.

The bare output of

SELECT *
  FROM dalstock a left join dalstock b ON( a.name = b.name AND a.ist > b.ist );

┌────┬────────────┬───────┬───────┬────┬────────────┬───────┬───────┐
│ id │    IST     │ Name  │ stock │ id │    IST     │ Name  │ stock │
├────┼────────────┼───────┼───────┼────┼────────────┼───────┼───────┤
│ 1  │ 2022-07-04 │ Arhar │ 5.0   │    │            │       │       │
│ 2  │ 2022-07-04 │ Arhar │ 7.0   │    │            │       │       │
│ 3  │ 2022-07-05 │ Arhar │ 8.0   │ 1  │ 2022-07-04 │ Arhar │ 5.0   │
│ 3  │ 2022-07-05 │ Arhar │ 8.0   │ 2  │ 2022-07-04 │ Arhar │ 7.0   │
└────┴────────────┴───────┴───────┴────┴────────────┴───────┴───────┘

The maximum value of b.ist is '2022-07-04'. But there are two rows that contain that maximum value. The b.stock value might come from either row. This time it selected a b.stock of 5.0. Tomorrow it might choose 7.0.

(11) By Ryan Smith (cuz) on 2022-07-22 12:46:14 in reply to 9 [link] [source]

While I was writing my reply, I tried to find in the documentation regarding the bare-column choosing on max()/min(), some mention of the fact that once you have more than one of these aggregate queries (in the top SELECT or elsewhere) that the row chosen becomes indeterminate.

I know it seems obvious, but as this thread showed, perhaps it is worth mentioning in the docs.

(12) By Richard Hipp (drh) on 2022-07-22 12:58:45 in reply to 11 [link] [source]

At https://sqlite.org/lang_select.html#bare_columns_in_an_aggregate_query, 4th paragraph:

"There is still an ambiguity if ... the query contains more than one min() and/or max() aggregate function."

I'll see if I can reword that to be clearer.

(13) By Ryan Smith (cuz) on 2022-07-22 13:30:56 in reply to 12 [link] [source]

If I may offer a possible wording to replace the section quoted:

"This works when a clear single min()/max() calculation is present and becomes indeterminate when specifying multiple min()/max() aggregates, anywhere within the SELECT."

Something to that effect should suffice.

(14) By Richard Hipp (drh) on 2022-07-22 14:00:41 in reply to 13 [link] [source]

Updated documentation is now at https://sqlite.org/lang_select.html#bare_columns_in_an_aggregate_query.

Please report typos or suggestions for making the text clearer.

(15) By David Raymond (dvdraymond) on 2022-07-22 14:42:15 in reply to 14 [link] [source]

My only comment is that when I read it (both before and now) my brain interprets it as talking about the select clause (the result-column bit) and not the whole select statement.

I acknowledge that's more an issue with my brain than with the documentation, but maybe something like...

If there are two or more min() or max() aggregates anywhere in the query (not just in the select clause/result columns) then bare column values will be taken from...

Though I'm not good at knowing if that clears it up, or just makes it more confusing for others.

(16) By Sunny Saini (SunnySaini_com) on 2022-07-23 00:41:10 in reply to 14 [link] [source]

Dear Author, thanks for making documentation clear and intuitive, now.

(17.2) By Sunny Saini (SunnySaini_com) on 2022-08-29 16:27:20 edited from 17.1 in reply to 9 [link] [source]

Dear Richard,

There is still confusion. Every thing is going above my head.

Every thing appears normal in this query with result as desired.

here just changing the position of aggregate function in having clause gives unexpected result. Instead of chosing max(b.id) = 2, it is chosing min(b.id)!

Since, aggregate function of columns b.id and b.ist so bare columns are Name and Stock, then why b.id is 1.

Kindly, clarify

[Query #3 & #4 removed due to typing errors]

(18) By David Raymond (dvdraymond) on 2022-08-29 12:55:49 in reply to 17.0 [link] [source]

The query you pasted in for #3 is missing something that's in the others.

In #3 you have "...and b.ist group by..."

whereas in all the others you have "...and a.ist > b.ist group by..."

(19) By Sunny Saini (SunnySaini_com) on 2022-08-29 13:30:43 in reply to 18 [link] [source]

Thank you for pointing out my typing mistake. id = 3 is due to the result of the incomplete query.

The correct query is:

Query 3.
Select b.* from dalstock a left join dalstock b on a.name = b.name and a.ist > b.ist group by b.name having max(b.id) and max(b.ist);

and the result is same as for Query 2.

Could you please explain why in Query 2, min(b.id) is being chosen by SQLite. I am not able to understand it.

(20) By Keith Medcalf (kmedcalf) on 2022-08-29 16:37:09 in reply to 17.2 [link] [source]

Do you think that postpending HAVING MAX(x) will cause the selection of the row with the MAX value of X?

If so, then that is the root of your misunderstanding. All the clause HAVING MAX(x) means is that if and only if the maximum value of x for the candidate group is non-zero, then compute the projection for that group.

It has nothing whatsoever to do with selecting the row that contains the maximum value.

(21) By Sunny Saini (SunnySaini_com) on 2022-08-30 02:09:02 in reply to 20 [link] [source]

Kindly refer to the Documentation page

Each expression in the result-set is then evaluated once for each group of rows. If the expression is an aggregate expression, it is evaluated across all rows in the group. Otherwise, it is evaluated against a single arbitrarily chosen row from within the group.

So, I think it should choose a maximum value of id and not whether id is zero or not.

(22) By SeverKetor on 2022-08-30 02:29:06 in reply to 21 [link] [source]

I don't think they'll make a change that affects the SQL logic solely based on your (I'd argue incorrect) assumption of how it should work.

The quoted section says it's for the result-set. Why would a description of the logic for the result-set offer a guarantee for the HAVING clause?

Also, how would you handle something like HAVING MAX(Expr) > 3? Or how about HAVING MAX(Expr) OR MIN(Expr)? Do you want the code to become more complicated to deal with all the various edge-cases your change would undoubtedly cases, or simplified and probably full of landmines, or would you prefer to just leave it as is and change your own SQL, and assumptions, instead?

(23) By Sunny Saini (SunnySaini_com) on 2022-08-30 02:57:22 in reply to 22 [link] [source]

I am not saying to change the code. Only thing is I am not able to understand the logic.

Please see the upper text of the quoted section. It mentions about the result set of "having" clause with aggregate function.

Also, could you please give some examples of "Expr" in

HAVING MAX(Expr) > 3

I want to test that.

(24) By David Empson (dempson) on 2022-08-30 06:54:22 in reply to 21 [link] [source]

Again referencing the documentation page

The paragraph you quote (starting "Each expression in the result-set") is explaining what happens after the HAVING clause has already decided whether or not a group of rows should be discarded. The reference to aggregate functions in that paragraph explains how they work in the list of result columns, not in the HAVING clause.

The preceding paragraph is the critical one for understanding what your HAVING clause does. Trimming to just the first two sentences:

If a HAVING clause is specified, it is evaluated once for each group of rows as a boolean expression. If the result of evaluating the HAVING clause is false, the group is discarded. If the HAVING clause is an aggregate expression, it is evaluated across all rows in the group.

Based on that paragraph, your "HAVING max(b.ist) and max(b.id)" is doing the following:

For each group of rows, find max(b.ist) and max(b.id).
The AND means that max(b.ist) and max(b.id) are treated as boolean: true if max() returns a nonzero value, false if max() returns a zero value. Since IST is a string, max(IST) is converted to a number to evaluate as a boolean, therefore anything which looks like an ISO date such as '2022-07-05' will be interpreted as the year (stopping at the first hyphen), which is nonzero therefore true.
Those two booleans are then ANDed together to form a single boolean to be tested by HAVING.

In other words, a group of rows will be included in the results only if both max(b.ist) and max(b.id) are nonzero, with each max() function evaluated across all rows in the group.

This has nothing to do with selecting a single row, as that is not something which aggregate functions do in the context of the HAVING clause, only in the result columns.

Assuming your IST are all dates and your id column is always nonzero (and not NULL, and at least one row exists), those two max functions will always evaluate as true, therefore the HAVING clause is always true, and all groups are included in the result.

Therefore this HAVING clause will not affect the result at all.

The "GROUP BY a.name" clause then collapses each group to a single row per name in the result. If you didn't use aggregate functions in the result columns then SQLite picks an arbitrary (and potentially inconsistent) row from the group.

It would be helpful if you could explain what you are actually trying to achieve.

My interpretation is:

You have a table of something like stock prices which may have multiple entries on the same date (IST) for the same name, ordered by id, i.e. multiple updates of the stock price per day and the entry with the greatest id on that day is the important one (last trade).
You want to produce a list of all rows in the table where an earlier entry (in IST order) exists for the same name; if there are multiple matches, pick the one with the greatest IST, and if that has multiple matches, pick the one among those which has the greatest id.

If I have that right, then something like the following compound SELECT statement should do the trick. Note that you can't do this in a single SELECT because max(IST) and max(id) might produce different rows, depending on the order in which rows were inserted.

SELECT a.*, b.id, max(b.IST), b.stock FROM
    dalstock a
JOIN
    (SELECT max(id) id, IST, Name, stock FROM dalstock GROUP BY Name, IST) b
ON a.Name = b.Name AND a.IST > b.IST
GROUP BY a.Name;

The inner SELECT uses GROUP BY and max(id) to filter multiple rows with the same (Name,IST), leaving just the one with the greatest ID. These are the candidates for "earlier IST with the same name" (and is also a list of the last trades each day for each stock).

The outer SELECT uses GROUP BY and max(IST) to filter multiple rows from the inner SELECT which have the same Name but different IST, leaving just the one with the greatest IST which is less than a.IST.

(25) By Sunny Saini (SunnySaini_com) on 2022-08-30 08:49:20 in reply to 24 [link] [source]

David Empson,

The AND means that max(b.ist) and max(b.id) are treated as boolean: true if max() returns a nonzero value, false if max() returns a zero value. Since IST is a string, max(IST) is converted to a number to evaluate as a boolean, therefore anything which looks like an ISO date such as '2022-07-05' will ...

Thank you for this explanation. Probably I would never understand the matter without this explanation.

My interpretation is:
You have a table of something like stock prices which may have ...

Your interpretation is absolutely correct. This is what I want to achieve.

Thank you so much for giving the Compound query. The query is the exact solution to what I want.

Thanks once again for spending your precious time for the explanation.

(26) By Sunny Saini (SunnySaini_com) on 2022-08-30 13:44:49 in reply to 24 [link] [source]

David Empson,

Please spend some more time as there is still some confusion for:

The AND means that max(b.ist) and max(b.id) are treated as boolean: true if max() returns a nonzero value, false if max() returns ...

If the "AND" keyword forces "having" clause to treat aggregate function as boolean expression then why is the aggregate function working correctly for the dataset group in the following queries:

#5
Select * from DalStock group by name
having max(ist) AND 1;
3|2022-07-05|Arhar|8

#6
Select * from DalStock group by name
having min(ist) AND 1;
1|2022-07-04|Arhar|5

Kindly figure it out.

(27) By Keith Medcalf (kmedcalf) on 2022-08-30 17:16:53 in reply to 26 [link] [source]

They are always always working corectly. Why do you think they are not?

SQL processes all the tables and where conditions in order to create a bunch of "candidate rows". If there is a group by clause, then the candidates are divided into groups and each group is processed separately (a HAVING clause or using an aggregate function without a GROUP BY implies that all the candidate rows form a single group).

If there is a HAVING clause, it is then evaluated for the entire group of candidate rows. If the result of evaluating the HAVING expression is True (value as numeric is non-zero), then the projection (SELECT) processing is performed for the entire group and one row is output representing the entire group.

Lather, rinse, repeat.

The expression(s) which appears in the HAVING clause is/are part of the projection for the group (that is, as if it appears in the SELECT clause) though it may be an invisible column that is not output, it is there nonetheless whether you like it or not.

(28) By Sunny Saini (SunnySaini_com) on 2022-08-30 17:44:21 in reply to 27 [link] [source]

If there is a HAVING clause, it is then evaluated for the entire group of candidate rows. If the result of evaluating the HAVING expression is True (value as numeric is non-zero), then the projection (SELECT) processing is performed for the entire group and one row is output representing the entire group.

and one row is output representing the entire group.

How is this one row decided for two aggregate max() functions as part of the "Having" clause as in my post 17.2 query #2?

(29) By Keith Medcalf (kmedcalf) on 2022-08-30 17:49:56 in reply to 28 [link] [source]

How is this one row decided for two aggregate max() functions as part of the "Having" clause as in my post 17.2 query #2?

If you do not know the answer to this already then prehaps the underlying cause of your difficulty has been found. You are using constructs using the plug and pray methodology. This has never worked.

You should know what you are doing BEFORE you do it, not after.

(30) By Keith Medcalf (kmedcalf) on 2022-08-30 17:54:59 in reply to 29 [link] [source]

That is to say more clearly that you are starting with the feces and trying too determine the brand of hay that was fed the horse.

You should start with knowing what it is that you wish to achieve (the problem to be solved) and then solve that problem.

Creating a nonsence SQL statement for no purpose is rather foolish.

It is much easier to explain the flaws in your reasoning if you explain your reasoning, starting with a comprehensive problem statement. Someone will then be able to explain to you why your "solution" is incorrect.

(31) By Sunny Saini (SunnySaini_com) on 2022-08-30 18:28:03 in reply to 30 [link] [source]

Refer to your post No. 20

Do you think that postpending HAVING MAX(x) will cause the selection of the row with the MAX value of X?

Now again refer to the Documentation page

If a HAVING clause is specified, it is evaluated once for each group of rows as a boolean expression. If the result of evaluating the HAVING clause is false, the group is discarded. If the HAVING clause is an aggregate expression, it is evaluated across all rows in the group. If a HAVING clause is a non-aggregate expression, it is evaluated with respect to an arbitrarily selected row from the group. The HAVING expression may refer to values, even aggregate functions, that are not in the result.

If the HAVING clause is an aggregate expression, it is evaluated across all rows in the group.

Don't you think these two are contradictory? How do you explain this?

(32) By Keith Medcalf (kmedcalf) on 2022-08-30 19:02:39 in reply to 31 [link] [source]

No, they are not contradictory.

(33.1) By Sunny Saini (SunnySaini_com) on 2022-08-30 19:42:20 edited from 33.0 in reply to 32 [link] [source]

In your post No. 20 you further wrote:

If so, then that is the root of your misunderstanding. All the clause HAVING MAX(x) means is that if and only if the maximum value of x for the candidate group is non-zero, then compute the projection for that group.

Then for max(X) function in "having" clause of "group by" why max of X is always chosen by SQLite? And similarly, for min(X), why minimum of X is chosen? why isn't same row is projected for max(X) as well as for min(X) (when one of these two functions is used at a time)? Because here, boolean for both of these is "true"?

(34) By Keith Medcalf (kmedcalf) on 2022-08-30 19:19:56 in reply to 31 [link] [source]

I think I see the root cause of your misunderstanding.

You believe that the projection (the SELECT c1,c2,c3,...cn) part of the statement is "separate" from the GROUPING and HAVING processing. In other words, it is your belief (wish) that the following steps were performed independently:

Generate the candidate rows
Divide the candidate rows into groups
For each group evaluate the having clause and discard the group if false.
Perform the projection against the surviving group.
Output the projection for each surviving group.

This is not, however, how any implementation of SQL ever worked. Rather, they (all) work thusly:

Generate the candidate rows.
Process the candidate rows to project a GROUP result (the HAVING clause expression is calculated as part of the projection process).
If the group is complete then check if the having expression is true. If so, output the projection.
Lather, Rinse, Repeat.

This is because, back in the before times, the HAVING clause (like the ORDER BY clause and GROUP BY clause) could only refer to elements of the projection clause. This constraint was removed. However, the underlying mechanism of processing (that is, the underlying relational algebra) is unchanged and, although you cannot see them, the contents of the HAVING, ORDER BY and GROUP BY clauses must be included in the projection (there is no other way to calculate them)*.

There is, but it would be so slow that there would be no point in doing so.

(35) By Sunny Saini (SunnySaini_com) on 2022-08-30 19:57:30 in reply to 34 [link] [source]

In the newer SQL method:

3 If the group is complete then check if the having expression is true. If so, output the projection.

Is the "having expression" checked for each row of the group for true/false?

(37) By Chris Locke (chrisjlocke1) on 2022-08-30 21:58:14 in reply to 29 [link] [source]

You should know what you are doing BEFORE you do it, not after.

That's what my wife said. :(

(36) By Ryan Smith (cuz) on 2022-08-30 20:16:57 in reply to 26 [link] [source]

The question is bordering on the willfully obtuse. It's Computer Science 101

Boolean logic:
TRUE  AND TRUE  == TRUE
FALSE AND TRUE  == FALSE
in Boolean logic numbers:
1  AND 1  == 1
0  AND 1  == 0

So essentially saying "AND 1" doesn't mean anything because from the above you can see that:

X AND 1 == X   for any Boolean logic value of X
and since 
X == X too, the "AND 1" part is as useless as a chocolate teapot, so can be (and is) just ignored by the query engine.


Similarly (as an aside) the following is true: 

X OR 1 == 1   for any Boolean logic value of X

(Note this only applies to Boolean LOGIC, for boolean math it's a different story where 2 AND 1 == 3 or 6 AND 3 = 7, etc.)


So any query with:
  ...
  having XXX AND 1
  ... 
will be logically equivalent to the same query only stating:
  ...
  having XXX
  ...

As to why the MAX() and MIN() calculations cause a different row to be selected as the output target, let me explain that one last time in the hope that clarity ensues:

In an aggregate query, if you use a single aggregate MIN() or MAX() function in the ENTIRE query, then, in SQLite specifically, the target output row from the group for the other non-aggregate fields will be chosen to be the very row in which the determined MIN() or MAX() value appears in the group. If the value appears in multiple different rows within the group, then the target output row will be chosen arbitrarily from among them.

This is true so long as:

it is the only aggregate function in the entire query. i.e. it cannot coexist[1] with, say, an AVG() function
there is only ONE single such aggregate function in the entire query: i.e. it cannot coexist[1] with another MIN() or MAX() on another field, though the exact same expression can be used multiple times (SELECT MAX(x) FROM ... HAVING MAX(x) ...)

[1] "Cannot coexist" is not preventative, it should be read as "Cannot coexist while still working (or expected to work) in the way described"

All of the above was said before in this thread, though arguably not in a single place, in one instance Richard even amended documentation to make it more clear.

You can go back to every question you've asked in this entire thread and by simply reading the above explanation aloud to yourself can discover in each case the answer to your every "why" question.

(40) By Sunny Saini (SunnySaini_com) on 2022-08-31 10:28:03 in reply to 36 [link] [source]

there is only ONE single such aggregate function in the entire query: i.e. it cannot coexist[1] with another MIN() or MAX() on another field, though the exact same expression can be used multiple times (SELECT MAX(x) FROM ... HAVING MAX(x) ...) [1] "Cannot coexist" is not preventative, it should be read as "Cannot coexist while still working (or expected to work) in the way described"

This is what I wanted to listen. This is required to be mentioned in the Documentation page

The sentence:

If a HAVING clause is specified, it is evaluated once for each group of rows as a boolean expression. If the result of evaluating the HAVING clause is false, the group is discarded. If the HAVING clause is an aggregate expression, it is evaluated across all rows in the group. If a HAVING clause is a non-aggregate expression, it is evaluated with respect to an arbitrarily selected row from the group. The HAVING expression may refer to values, even aggregate functions, that are not in the result.

If the HAVING clause is an aggregate expression, it is evaluated across all rows in the group.

must be mentioned somewhat as:

If the HAVING clause is an aggregate expression, it is evaluated across all rows in the group provided, only one aggregate function is used.

Why I am saying this is also because of the following bahaviour of the "having" clause of SQLite:

For Queries (with Column names abc, X and Y)

#1 Select * from t_name group by abc having max(X) and max(Y);

#2 Select * from t_name group by abc having max(X) or max(Y);

the max(X) is just ignored and query is processed as:

#3 Select * from t_name group by abc having max(Y);

And for the queries:

#4 Select * from t_name group by abc having max(Y) and max(X);

#5 Select * from t_name group by abc having max(Y) or max(X);

the max(Y) is just ignored and is processed as:

Select * from t_name group by abc having max(X);

(38) By David Empson (dempson) on 2022-08-31 00:35:23 in reply to 26 [link] [source]

If the "AND" keyword forces "having" clause to treat aggregate function as boolean expression then why is the aggregate function working correctly for the dataset group in the following queries:
#5 
Select * from DalStock group by name 
having max(ist) AND 1; 
3|2022-07-05|Arhar|8
#6 
Select * from DalStock group by name 
having min(ist) AND 1; 
1|2022-07-04|Arhar|5

Your examples are producing specific rows because the use of a single aggregate function in the entire SELECT statement (in this case the HAVING clause) has the side effect of causing the row output by the SELECT statement for that group to be an arbitrary row which contains the maximum (or minimum) value in IST, rather than an arbitrary row from the entire group. (In my previous message I didn't realise this side effect also happened for aggregate functions in the HAVING clause - I've only used this feature in result columns of SELECT because using it in HAVING is logically flawed.)

That side effect has nothing to do with how the logic works for the HAVING clause itself and the AND operator. By definition, AND is a boolean operator, so the expressions passed to AND must be converted to a number which can be interpreted as boolean (zero or nonzero), and the boolean result of AND is used by HAVING to test whether to include or discard that group of rows.

The "AND 1" part doesn't change anything and is completely pointless - if that was omitted, HAVING itself requires a boolean parameter, so the result of the max() or min() function would still be converted to boolean to test for HAVING.

I'd regard HAVING min(col) or HAVING max(col) to find a row with the minimum or maximum value to be incorrect usage, because they would fail to return any row if the result of min() or max() was zero when converted to a number. This usage only works for columns known to contain data in every row which can be interpreted as a number greater than zero. They would produce incorrect results with numeric columns that might have zero values, or text columns that might have values that are empty strings or which don't start with a digit.

Because of this inconsistency it is better to completely avoid this notation as you could only use it in some cases (and with data you trusted was correct).

HAVING max(IST): If all the rows in the group had values in IST which were interpreted as zero when converted to a number (or at least one was zero and the rest were negative), then the max(ist) would be interpreted as 0, and that group would be discarded by HAVING, therefore no row would be output.

HAVING min(IST): If at least one of the rows in the group had a value in IST which was interpreted as zero when converted to a number (and none were negative), then min(ist) would be 0, and that group would be discarded.

The following form is better and consistently handles zero values and empty/non-numeric text:

SELECT Name, max(IST) FROM dalstock GROUP BY Name;

This makes more sense logically, and is similar to finding the maximum for the entire table (where you couldn't use a HAVING clause due to lack of GROUP BY):

SELECT Name, max(IST) from dalstock;

(39) By Keith Medcalf (kmedcalf) on 2022-08-31 01:59:37 in reply to 38 [link] [source]

SELECT Name, max(IST) from dalstock;

being exactly, save the projection contents, as:

SELECT Name from dalstock having max(IST);

(10) By Ryan Smith (cuz) on 2022-07-22 12:39:00 in reply to 8 [link] [source]

I see why you might think that I did not read everything, perhaps I should have been more clear - I have read every post, and I understand exactly what is going on, and why you (and David for that matter) expect a different result to what is achieved.

The problem is with your expectation, and not with SQLite - but this time I won't just assert it, I will explain why:

Consider this query on your data: ``` SELECT max(b.id) id, b.ist, b.stock FROM dalstock AS a LEFT JOIN dalstock AS b on a.name = b.name AND a.ist > b.ist GROUP BY a.name ;

This query produces correct (and expected) output because the calculation done inside the max() aggregate query will pin/locate the row number where the max is found (or one of the max values should they appear many times, such as when max(name) would be specified).

Now move on to this query:

SELECT max(b.id) id, max(b.ist), b.stock FROM dalstock AS a LEFT JOIN dalstock AS b on a.name = b.name AND a.ist > b.ist GROUP BY a.name ;

This time, there are 2 max() calculations, one on b.id and one on b.ist, but they offer different indexes for the max row, so the row chosen for max() becomes immediately difficult to pin down, it might be done by either of those max() calculations, depending on how the Query planner visits them. The output can no longer be expected to pin specifically the row found during the FIRST max() aggregate calculation (or the other one for that matter), the resulting row is now from either max() calculations, but indeterminate which.

When I run this query it satisfies the first row of the second max() (on b.ist, which may or may not be the case for you depending on version, schema and query planner, but surely is clear to see that it is valid).

Now finally, this next query (the original one disputed) has the exact same trouble, and just because the max() calculation is not inside the top field definition, it still is a field definition like any other, even if it only appears in the HAVING clause, it still has to be calculated, and still allowed to offer a contender for the pinned row, just like the previous query.

SELECT max(b.id) id, b.ist, b.stock FROM dalstock AS a LEFT JOIN dalstock AS b on a.name = b.name AND a.ist > b.ist GROUP BY a.name HAVING max(b.ist) ; ```

Put another way: Obscuring the max() calculation into the HAVING clause does not make it less of a field-calculation nor less of a candidate for locating/pinning a max()-conformant row as is the mechanism which SQLite promises to apply.

Does it offer confusion? Perhaps, is it an error that should be thrown? For sure not, and I would be very disappointed if my SQLite starts erroring out on the above query.

I hope this longer version makes my previous short version answer more clear.

(41) By Sunny Saini (SunnySaini_com) on 2022-09-02 03:14:18 in reply to 5 [link] [source]

It's "probably" getting them from the row used for max(b.ist) in the HAVING clause. But that situation isn't mentioned or covered in the documentation for that quirk. Everything in there suggests that, well, I've got a single max() in my select, so the bare columns will come from there.

I agree.

(3) By Holger J (holgerj) on 2022-07-17 17:50:07 in reply to 1 [link] [source]

What do you expect from a boolean condition
HAVING MAX(...)

Unless the expression in the parentheses is of BOOLEAN type (which not really exists in SQLite, but in the SQL standard), the expression doesn't make any sense.

HAVING MAX(...) = someValue
would make sense, or any other boolean expression, so a comparison or the like.

You can expect that an SQL engine gives you a sensible result if you enter a sensible query, but don't expect to give a sensible result in this case.

Yes, an error message would be fine.

(4) By Keith Medcalf (kmedcalf) on 2022-07-17 18:27:43 in reply to 3 [link] [source]

What do you expect from a boolean condition
HAVING MAX(...)

If the value of the max(...) of the candidate group is a non-zero number then the condition passes and the candidate group is output. On the other hand, if the value of max(...) for the candidate group is numerically zero or null then the candidate group is not output.

This is exactly what would happen if you used max(...) == ... because the result of the operation (==) is either non-zero (true), zero (false), or unknown (null) -- which is treated as false in a WHERE/HAVING clause.

That is to say that HAVING MAX(...) has exactly the same meaning as the spelling HAVING MAX(...) <> 0 -- it is just 5 less keystrokes and does not execute an additional useless comparison operation, thus saving a bit of time typing and a few nanoseconds of CPU time.

(42) By Sunny Saini (SunnySaini_com) on 2022-09-02 03:19:48 in reply to 3 [link] [source]

What do you expect from a boolean condition HAVING MAX(...)

Not part of SQL, this may be considered as an SQLite specific Extension.

Further for HAVING MAX(...)

the expression is not considered as boolean but worked upon all rows of the group to find Max(column_name).

This is documented. But the documentation needs to be amended as i had written earlier.

(43) By David Empson (dempson) on 2022-09-02 03:46:40 in reply to 42 [link] [source]

Further for HAVING MAX(...) the expression is not considered as boolean but worked upon all rows of the group to find Max(column_name).

The result of the max() certainly is considered as boolean for the purposes of the HAVING test, as I and others have explained already in this thread.

The purpose of HAVING is to do a boolean test to determine whether a group should be included in the result.

There are two things going on when you use HAVING max(col):

max(col) is evaluated, the result is converted to a number, and interpreted as a boolean (zero means false, nonzero means true). If max(col) is zero (false), the HAVING test fails and the group is discarded from the result of the SELECT statement. The typical case this happens if if all rows in your group have something which evaluates as zero in col.
If and only if max(col) is nonzero, then the HAVING max(col) test passes and the group will be included in the result. The SQLite-specific side-effect of using max() as an aggregate function in a SELECT statement then comes into play: the row which gets output for the group is an arbitrary one which contains the maximum value in col.

(44) By Sunny Saini (SunnySaini_com) on 2022-09-02 04:49:58 in reply to 43 [link] [source]

If and only if max(col) is nonzero, then the HAVING max(col) test passes and the group will be included in the result. The SQLite-specific side-effect of using max() as an aggregate function in a SELECT statement then comes into play: the row which gets output for the group is an arbitrary one which contains the maximum value in col.

At last, all of my doubts are clear now.

David Empson, you are a very good teacher.

Thank you so much for all your help.