SQLite User Forum

operator precedence docs missing, suggested clarifications below
Login

operator precedence docs missing, suggested clarifications below

(1.1) By Rico Mariani (rmariani) on 2021-07-16 23:56:53 edited from 1.0 [link] [source]

In https://sqlite.org/lang_expr.html section 2 BETWEEN and NOT are not placed on the list beside the other operators.

I still have some of these not quite right (see https://cgsql.dev/cql-guide/ch03/) but as far as I can tell this is correct:

OR
AND
NOT
BETWEEN
EQUALITY
INEQUALITY
BINARY // & | << >>
ADD // also sub
MUL // also div and mod
CONCAT
COLLATE TILDE COLLATE (order doesn't matter amongst these)

(presently I have CONCAT and COLLATE wrong in cgsql)

Docs:

The COLLATE operator has a higher precedence (binds more tightly) than any binary operator and any unary prefix operator except "~"

Proofs:

-----  NOT is weaker than BETWEEN 
select not 0 between -1 and 2; --> 0
select not (0 between -1 and 2); --> 0
select (not 0) between -1 and 2; --> 1

----- BETWEEN is weaker than equality 
select 1=2 between 2 and 2;  --> 0
select (1=2) between 2 and 2; --> 0
select 1=(2 between 2 and 2); --> 1

----- BETWEEN is left associative 
select 0 between 0 and 3 between 2 and 3; -->  0
select (0 between 0 and 3) between 2 and 3; --> 0
select 0 between 0 and (3 between 2 and 3); --> 1

---- TILDE is stronger than CONCAT 
select ~ 1||2;  --> -22
select (~ 1)||2; --> -22
select ~ (1||2); --> -13

--- NEGATION is stronger than CONCAT
select -0||1;  -> 01
select (-0)||1;  -> 01
select -(0||1); -> -1

NEGATION and TILDE have unambiguous order always or binding strength doesn't matter. Collation is independent of value so binding strength there also doesn't matter.

Just in case negation was different than negative numbers:

select -(0)||1; -> also 01

more conclusively:

sqlite> select -5||'-5';
-5-5
sqlite> select (-5)||'-5';
-5-5
sqlite> select -(5||'-5');
-5
so negation definitely comes before (with no parens)

(2) By Rico Mariani (rmariani) on 2021-07-16 23:54:58 in reply to 1.0 [link] [source]

Just in case negation was different than negative numbers:

select -(0)||1; -> also 01

more conclusively:

sqlite> select -5||'-5';
-5-5
sqlite> select (-5)||'-5';
-5-5
sqlite> select -(5||'-5');
-5
  • definitely comes before || (with no parens)

(3) By Rico Mariani (rmariani) on 2021-07-16 23:58:10 in reply to 2 [link] [source]

-- merged with the above

(4) By Larry Brasfield (larrybr) on 2021-07-17 00:31:20 in reply to 1.1 [link] [source]

In https://sqlite.org/lang_expr.html section 2 BETWEEN and NOT are not placed on the list beside the other operators.

Section 2 covers the binary and unary prefix operators. So to expect BETWEEN (or NOT BETWEEN) to appear there is a mistake. If that keyword (or keyword pair) was to be considered an operator, it would be a trinary operator. This can be seen in the railroad chart of Section 1 on Syntax, where it (they) show(s) up as part of the formal expression syntax alternatives.

A close reading of Section 2 shows that the operator "lists" are either of binary operators, where NOT should not appear, or of unary operators, where NOT does appear. Hence, I do not understand your critique regarding NOT.

As to precedence issues: Are you claiming that you have disproven the documented precedence of any binary operator listed in Section 2. Or are you saying the grammar is ambiguous with respect to COLLATE, such that it should be given a precedence to disambiguate [expr]-(COLLATE)-(collation-name) when some [expr] is a binary expression?

Looking through your "proofs" seems like a lot more work than looking at the grammar specification, so I would ask: Do you believe you have proven that said specification has been disproven? Or are you simply claiming something is missing from or incorrect in the docs (beyond the "missing" claims disproven above)? Either way, please be more specific. (I am too busy to see if I can interpret your post into a falsifiable assertion that is, in fact, false.)

(5) By Rico Mariani (rmariani) on 2021-07-17 00:39:10 in reply to 4 [link] [source]

To be clear, what is written is correct. I found no flaws. But as a person trying to understand the relative order of the operators I found that the fact that there was no one total ordering specified to leave me with questions. So I conducted a few experiments to lock down the order.

From these I concluded the extra facts about where NOT, ~, -, and BETWEEN go.

This is important to the extent that the docs do not tell you if

select not 0 between -1 and 2;

means this:

select (not 0) between -1 and 2;  -- it does not

or this:

select not (0 between -1 and 2);  -- it does

The proofs are just statements engineered so that the answer would be different depending on the ordering. Then you compare the one with the forced parens to the no parens and you can easily see what the ordering is.

The grammer:

%left OR.
%left AND.
%right NOT.
%left IS MATCH LIKE_KW BETWEEN IN ISNULL NOTNULL NE EQ.
%left GT LE LT GE.
%right ESCAPE.
%left BITAND BITOR LSHIFT RSHIFT.
%left PLUS MINUS.
%left STAR SLASH REM.
%left CONCAT.
%left COLLATE.
%right BITNOT.

Seems to be right in line with what I discovered by experimentation.

I think the full list would make a nice addition to the docs.

(8) By Larry Brasfield (larrybr) on 2021-07-17 01:34:43 in reply to 5 [link] [source]

Please bear with me as I try to formulate an argument for changing that doc to improve ease of comprehension (by adding or changing something) without degrading comprehension via needless volume increase. I already have the argument against: It is unclear, as of yet, that some sort of unified precedence chart would make it easier for SQL writers to know what their SQL will mean.

It appears that we have gone from the assertion implicit in your thread title, "... docs missing ...", to the assertion that a unified (rather than separated) precedence chart would be easier to understand. Is that right? Does that satisfy your sense that too much reading was needed to find the (not really) "missing" precedence of BETWEEN (in Section 6) and COLLATE (in section 2)? In other words, is your contention simply that the statements about precedence should group together more?

I trust you see the pitfall of simply listing BETWEEN with the binary operators. It is not a binary operator, and if that operator list became "binary and trinary operators" it would then need a distinguishing attribute of some kind to avoid confusing people as to which were binary and which were trinary, (or would force them to look further for that information, just as they must now only for BETWEEN.)

There are a couple things which would make the case for a change:

  1. One is a contrived but plausible SQL expression whose parse is ambiguous to a human reader, where reading enough of the expression syntax/semantics page to disambiguate it is "hard".

  2. The other is a suggested change which clearly resolves thing #1 (by being apparently less hard) and remains practically as clear as before for the ordinary unary and binary operators.

Do you want to propose something?

(9) By Rico Mariani (rmariani) on 2021-07-17 03:14:16 in reply to 8 [link] [source]

I think you've accurately captured my suggestion.

Some examples that might not be obvious

select not 0 between -1 and 2;

select 1 == 3 between -1 and 2;

select 0 between -1 and 2 == 2;

These are simple enough but they are contrived in that if anyone was even remotely in doubt they would just add parentheses.

The reason I happen to care is I'm writing a compiler and I need to be conformant to the SQLite rules. Hence the details matter to me whereas if I was just coding SQLite I wouldn't care so much, I'd add parens for clarity.

But to make the doc a little better I don't think you have to do much.

You could change it to something like this:

SQLite understands the following operators, in order from highest to lowest precedence:

~ - + (the common unary operators)
COLLATE
||
*    /    %
+    -
<<   >>   &    |
<    <=   >    >=
=    ==   !=   <>   IS   IS NOT   IN   LIKE   GLOB   MATCH   REGEXP  BETWEEN
NOT
AND   
OR

The unary prefix operators are these:

-    +    ~    NOT

The remaining operators are binary except for BETWEEN which is ternary.

With that change section 2 becomes your intro/summary section with additional details below if needed.

(14.1) By Larry Brasfield (larrybr) on 2021-07-17 04:45:03 edited from 14.0 in reply to 9 [link] [source]

You could change it to something like this:

SQLite understands the following operators, in order from highest to lowest precedence:

~ - + (the common unary operators) COLLATE ||

  • / %
  • - << >> & | < <= > >= = == != <> IS IS NOT IN LIKE GLOB MATCH REGEXP BETWEEN NOT AND
    OR

The unary prefix operators are these:

  • + ~ NOT

The remaining operators are binary except for BETWEEN which is ternary.

(Sorry for the extra vertical spacing. Quoting and <code/> mix poorly.)

I think that you have stripped the apparent binariness from most of the operators for those who do not read down to that final note. (And I would pick a bone with "[t]he remaining operators", which does not clearly apply only to the just mentioned unary prefix operators. As a separate paragraph, it has an referent ambiguity.)

From this discussion, I have come to believe that Section 6 on BETWEEN is out of place. It's emphasis on precedence, and the fact that precedence is wrapped up by the end of Section 2 except for BETWEEN ... AND, make me think its precedence should be mentioned in Section 2 with its semantics addressed there, like the treatment that COLLATE gets.

I will ponder how to get BETWEEN mentioned with other equality-like operators without introducing confusion or distraction for that majority of readers (at any one reading) who want to quickly confirm or learn of less odd operator precedence. I think that latter deserves more weight than you are giving it in favor of the more abstract unification.

(16) By Rico Mariani (rmariani) on 2021-07-17 05:06:08 in reply to 14.1 [link] [source]

Sounds good. I have every confidence it will be solid :D

(10) By Rico Mariani (rmariani) on 2021-07-17 03:18:21 in reply to 8 [link] [source]

Oh, I think of COLLATE as "binary" because of the collation name it takes but you could tweak the text of the summary to add a small exception for COLLATE the only unary suffix operator. It's not much more text and then you get a nice unified table.

(13) By Larry Brasfield (larrybr) on 2021-07-17 04:24:25 in reply to 10 [link] [source]

It is somewhat clear to me, looking at the grammar, that COLLATE is an odd duck when considered only as an operator. So I am not surprised to see its grouping described in text rather than in a table which, absent such oddness, conveys fairly simple and commonly understood information.

I am as fond of tables as anybody when there values can be treated alike, without special distinction (for the purpose of the table.) But I find that the advantage of such presentation is lost when distinction must be made among what look like similar entries. So I would dispute "nice unified". The unification would come at the cost of complicating what can accurately be said about the entries, and that falls into the "not nice" column, IMO.

(11) By Rico Mariani (rmariani) on 2021-07-17 03:30:59 in reply to 8 [link] [source]

Maybe I missed it, but I actually can't find anywhere where it says that where NOT goes. I did find all the others with not too much further looking.

Really, this is just an incremental improvement we're talking about here.

It is a bit strange for IN to be in the top list but not BETWEEN and second 2 is called operators not "binary and unary" operators.

Note, CAST/CASE/EXISTS don't need to be in list of operators because they are completely unambiguous just like parens.

(12) By Larry Brasfield (larrybr) on 2021-07-17 04:09:54 in reply to 11 [link] [source]

(Responding to this of 3 recent posts as it's easiest.)

... can't find anywhere where it says that where NOT goes.

Section 2 among the unary operators.

Really, this is just an incremental improvement we're talking about here.

The issue as I see it is whether it really is an improvement. I think there may be room for improvement, and I know that the project head, Richard Hipp, is virtually always in favor of doc changes that clearly are improvements. But I want to avoid change that lacks a clear argument for. He and others have done the docs with much attention to clarity and correctness, and I have no personal desire to change the documentation without a rationally arguable net benefit to readers. The question is about sign rather than magnitude.

It is a bit strange for IN to be in the top list but not BETWEEN

It's less strange if you consider: ... WHERE expr IN (inadvertent_scalar);

And as discussed, BETWEEN does not fit the brevity-yielding presentation scheme that ordinary operator precedence charts usually adopt. In fact, to shoe-horn it in, we would have to deal with both its BETWEEN part and its AND part. The first part has the same precedence as equality-like operators, but the second part (if considered not in the BETWEEN context) has a lower precedence. I'm not sure that putting it in the chart would not require an asterisk and explanation as to how certain caveats apply. And I am not at all surprised that the doc's author elected to document it separately, avoiding the potential distraction of readers wondering about the other 29-31 operators (likely to be a more common case.)

and second 2 is called operators not "binary and unary" operators.

I do not understand this assertion, (or if I do, it is plainly false.)

Note, CAST/CASE/EXISTS don't need to be in list of operators because ...

Good. That would be even more of a shoe-horning task.

(15.2) By Larry Brasfield (larrybr) on 2021-07-17 05:54:35 edited from 15.1 in reply to 5 [source]

(Reformatted quote for clarity. Answered separately because I forgot it earlier.)

This is important to the extent that the docs do not tell you if select not 0 between -1 and 2; means this: select (not 0) between -1 and 2; -- it does not or this: select not (0 between -1 and 2); -- it does .

But the docs do tell you which alternative is chosen. NOT clearly has higher precedence than BETWEEN, (per Section 6 and Section 2 para 2), so it is inarguable that your first alternative is correct. And I have to wonder how you conclude the contrary.

(Edited to add:)

Now that I revisit your proofs, and run your (well contrived) examples, I think there is a discrepancy between the documented grammar and the implemented grammar. My reading of the docs sections 2 and 6 leaves no possibility that the leading "not" will fail to group with the "0", leaving that "0" to become part of the between-and expression. This will definitely get a closer look, and either I will become a better reader or the said doc sections will be corrected. (I do not count an implementation change among the possibilities.)

(Edited further to add:)

A close reread of the doc on precedence shows nothing to support "NOT clearly has higher precedence than BETWEEN". The doc says nothing of NOT's precedence, but soon will. In my not-quite-supported thinking, because NOT is on the same line as unary '+' and '-',a I gave NOT their same precedence. But the precedence by row rank statement only applies to the preceding paragraph's binary operators.


a. This mistake was undoubtedly aided by the fact that in C/C++ (and I don't know how many other languages), logical not (aka '!') and unary '+' and '-' have high and identical precedence. The late hour may have been a factor too.

(17.1) By Rico Mariani (rmariani) on 2021-07-17 08:34:43 edited from 17.0 in reply to 15.0 [link] [source]

Actually NOT is WEAKER than between...

sqlite> select not 8 between -1 and 9;
0
sqlite> select (not 8) between -1 and 9;
1

The SQLite grammar concurs

%left OR.
%left AND.
%right NOT.
%left IS MATCH LIKE_KW BETWEEN IN ISNULL NOTNULL NE EQ.
%left GT LE LT GE.
%right ESCAPE.
%left BITAND BITOR LSHIFT RSHIFT.
%left PLUS MINUS.
%left STAR SLASH REM.
%left CONCAT.
%left COLLATE.
%right BITNOT.

I didn't find it clear at all until i saw the table in the grammar :D

As an illustration, check out https://en.cppreference.com/w/c/language/operator_precedence

My read of section is that between is tied with equality. The thing is I can't find where it says where NOT goes.

I'll go over your other replies and see if I can find more clues on NOT from what you said. Maybe it's just late and I'm tired.

(18) By Larry Brasfield (larrybr) on 2021-07-17 05:25:02 in reply to 17.0 [link] [source]

Please see my edited #15.1 (which was posted 10 minutes before your #17.) It appears we are in what I call "noisy agreement" on this.

(20.1) By Rico Mariani (rmariani) on 2021-07-17 06:34:01 edited from 20.0 in reply to 18 [link] [source]

Yup, I see your edit now. I think we're in agreement.

BTW the implementation is right, no question there, I wouldn't change a thing :D

I'll leave you to it.

Thank you for your attention!

(21) By Rico Mariani (rmariani) on 2021-07-17 05:29:09 in reply to 18 [link] [source]

sorry for the typos, sigh :D

(25) By Rico Mariani (rmariani) on 2021-07-17 14:59:26 in reply to 18 [link] [source]

I just noticed that 'ISNULL' 'NOTNULL' 'NOT NULL' are not even mentioned at all. Those would be unary suffix operators that mean the same as 'IS NULL', 'IS NOT NULL' as appropriate.

e.g.

sqlite> select 1 isnull;
0
sqlite> select null isnull;
1
sqlite> select null isitnull; --> a column alias, not an operator

sqlite> select 1 notnull;
1
sqlite> select null notnull;
0
sqlite> select 1 nonnull;  --> valid syntax but not NONNULL;
1
sqlite> select 11 nonnull; --> a column alias, not an operator
11
sqlite> select 11 notnull;
1
sqlite> select 11 not null;
1
sqlite> select 11 non null;   --> no syntax hazard with this form
Error: near "null": syntax error
sqlite> select null not null;
0
sqlite> select 11 not 1;  --> NOT 1 gives syntax error not semantic error
Error: near "1": syntax error

Note that the grammar does not treat "1 not null" as binary operator NOT applied to 1 and null. Though it could have done so (with anything on the right but null being a semantic error). Instead

expr(A) ::= expr(A) NOT NULL.    {A = sqlite3PExpr(pParse,TK_NOTNULL,A,0);}

Treats it as a two-word unary suffix; which frankly makes the most sense.

The other alternative, treating "NOT NULL" as a single lexeme would cause more problems because x IS NOT NULL would have an irregular parse. In any case discussing decisions the grammar did not make will in no way simplify the documentation, so forgive me for indulging in trivia.

FWIW I'm not a fan of these forms because of the syntax hazard:

select ? nonnull;

Which is an easy typo that would not be quickly spotted and is not a syntax error.

select ? is not null;

Does not have this problem. But again, such warnings may have a place somewhere but probably not in this chapter. These forms, hazardous or no, are part of generally accepted SQL.

(26) By Larry Brasfield (larrybr) on 2021-07-17 16:21:10 in reply to 25 [link] [source]

Rico, I am working on a unified operators/traits table which resolves my objections to your early suggestions and scratches your "unified" itch (which I share to a degree.) I will certainly incorporate all of the operators.

Thanks for your suggestions, arguments, and persistence.

(27) By Rico Mariani (rmariani) on 2021-07-17 20:35:19 in reply to 26 [link] [source]

sounds fabulous :D

(28.3) By Larry Brasfield (larrybr) on 2021-07-18 19:13:35 edited from 28.2 in reply to 27 [link] [source]

This is not going to render prettily, due to mark-down limitations, but the following table with footnotes is what I'm planning to incorporate into a revised expressions explication page. Interestingly, even though I created it independently, this nearly matches your post #24. It differs in two respects (once presentation details are subsumed). If you are willing, I would appreciate your feedback on it, after you decide how the deviations should be resolved.

My intention for this table is that be very quick to consult once its conventions have been understood. This outweighs points on adhering to more formal presentation formats.

Please see post #35 for revised table.

...

(29.1) By Rico Mariani (rmariani) on 2021-07-18 04:01:14 edited from 29.0 in reply to 28.1 [link] [source]

First off, this will help a ton. Here are some thoughts that might be useful.

I'll number the points for ease for discussion.


1 Use of ?

You might use "expr" instead of "?" because of the prevalence of "?" for binding. No big deal.

2 Arrow Notation

The arrow notation for left to right, or right to left might be misleading. For instance you use a left arrow to indicate left to right binding but of course a left arrow points from right to left. I read it backwards at first. You might be better served by the letters L/R. That said the notation is clear enough once you're used to the fact that left arrow means left to right.

You can also correctly read it as "the parens go on the left" hence the arrow points left. The arrow points in the direction of the implicit parens, in which case the arrow is natural.

This is also no big deal.

3 The ESCAPE operator

Looking at "% ?" and note 4. The operator is not "%" but "ESCAPE"

sqlite> select 'cx' like 'c%';
1
sqlite> select 'cx' like 'c%' escape 'c';
0

The railroad diagram indicates that ESCAPE can be applied to LIKE, GLOB, MATCH and REGEX. The grammar concurs (LIKE_KW seems to include REGEXP and GLOB though that was tricky to deduce). I had to consult:

static const unsigned char aKWCode[147] etc.

I've only ever seen ESCAPE used with LIKE. So I tested this out.

sqlite> select 'x' match 'y' escape 'z';
Error: wrong number of arguments to function match()
sqlite> select 'x' glob 'y' escape 'z';
Error: wrong number of arguments to function glob()
sqlite> select 'x' regexp 'y' escape 'z';
Error: no such function: regexp

I don't have REGEXP in my build but I bet it's the same as GLOB/MATCH we should try it.

Basically the grammar supports ESCAPE on all 4 but it's only done this way for symmetry. The conversion from x match y escape z to match(x, y, z) fails because match only supports 2 args. The others are the same.

So in short we don't need to discuss the fact that the others support [edited typo here] ESCAPE and indeed we might want to fix the railroad diagram too because it's kind of a lie. It parses but then, psyche, you can't actually do that.

4 REGEXP

The operator is REGEXP rather than REGEX

5 NOT [BETWEEN|LIKE|IN|MATCH|REGEXP|GLOB]

You might consider adding "e.g. 'x' NOT LIKE 'y'" to note 5.

##6 My Diagram Had Bugs

My version of the diagram has AND/OR placed incorrectly! Yours is correct! I actually posted two versions but one was lost. They were substantially similar in any case.


The fact that we came up with very similar diagrams I think is a good thing. Your goals/motivations are not identical to mine, yet we ended in a very similar location => seems like a good thing.

(33) By Larry Brasfield (larrybr) on 2021-07-18 04:05:50 in reply to 29.0 [link] [source]

You might use "expr" instead of "?" because of the prevalence of "?" for binding.

I'm not seeing the conflation potential, but since the page is about [expr] syntactic units, there is no reason to extraneously proliferate another symbol for it.

The arrow notation for left to right, or right to left might be misleading.

Yes.

The railroad diagram indicates that ESCAPE can be applied to LIKE, GLOB, MATCH and REGEX. ... might want to fix the railroad diagram too because it's kind of a lie.

This will be further investigated. I think it's a case of the railroad chart reflecting the parser's grammar but not semantic constraints applied later. It is (almost) easy to fix the railroad chart.

The operator is not "%" but "ESCAPE" ... operator is REGEXP rather than REGEX

Not sure where those brain farts were generated. Blown away now.

might consider adding "e.g. 'x' NOT LIKE 'y'" to note 5.

Yes. Done. (It was already in; I grabbed slightly outdated HTML generated from internal, VCS source.)

My version of the diagram has AND/OR placed incorrectly!

Glad to hear that. I don't need more bad head-scratchers tonight.

(30) By Rico Mariani (rmariani) on 2021-07-18 03:11:39 in reply to 28.1 [link] [source]

Additionally I have some concerns about the placement of +, -, ~, but I need a little time to be sure I'm not imagining things so please stand by and I'll post again on those shortly.

(31) By Rico Mariani (rmariani) on 2021-07-18 03:41:35 in reply to 28.1 [link] [source]

7 Order of the highest priority items

I think the top items are not quite right. I offer this analysis and a conclusion below.

Consulting the grammar

%left OR.
%left AND.
%right NOT.
%left IS MATCH LIKE_KW BETWEEN IN ISNULL NOTNULL NE EQ.
%left GT LE LT GE.
%right ESCAPE.
%left BITAND BITOR LSHIFT RSHIFT.
%left PLUS MINUS.
%left STAR SLASH REM.
%left CONCAT.
%left COLLATE.
%right BITNOT.
sqlite> select -'1y'||'0x3';
-10x3
sqlite> select (-'1y')||'0x3';
-10x3
sqlite> select -('1y'||'0x3');
-1

This indicates that - is stronger than COLLATE, and indeed I think it's grouped with BITNOT.

Similarly

sqlite> select ~1||'0x3';
-20x3
sqlite> select (~1)||'0x3';
-20x3
sqlite> select ~(1||'0x3');
-11

indicates ~ is stronger than ||

The priority of ~ and - cannot be distinguished because both are prefix operators binding inside out so they may as well be equal and indeed they are at the highest level of all, "BITNOT" in the grammar.

With regard to COLLATE, it is explicity placed higher than concat and lower than ~.

Now as to the + operator, you can't tell where it goes by experiment as it does nothing by defintion.

Even if you try to force a numeric conversion like I did with - and ~ it doesn't work...

sqlite> select +'1x'||'0000x3';
1x0000x3
sqlite> select +'1x';
1x

We have to get a little trickier:

sqlite> select -+'1y'||'0x3';
-10x3
sqlite> select -(+'1y')||'0x3'; -- interpretation 1
-10x3
sqlite> select -+('1y'||'0x3'); -- interpretation 2
-1

If + was weaker than || then we would have to have interpretation 2 above. But plainly we get the result of interpretation 1. So + is not weaker than || so we end up +/-/~ all in one group right to left.

Double checking the grammar I find this rule:

expr(A) ::= PLUS|MINUS(B) expr(X). [BITNOT] {...}

This makes it clear that BITNOT includes PLUS MINUS and BITNOT.

From here I conclude that highest binding items are in this order

* / %   -- the multiply group
||      -- concatenation
COLLATE -- unary suffix
+ - ~   -- unary prefixes

(32) By Rico Mariani (rmariani) on 2021-07-18 03:51:09 in reply to 28.2 [link] [source]

I noticed that while I was typing my follow-up you have already applied the suggestions you deemed appropriate. However item #4 concerning REGEXP is an important typo correction that hasn't yet been applied.

Other than that all that needs to happen is that unary + and - should be peers of ~, I think there's solid evidence for that.

(34) By Rico Mariani (rmariani) on 2021-07-18 18:21:52 in reply to 32 [link] [source]

summary of what I think is still outstanding:

  • one instance of "REGEX" left in the diagram (maybe already fixed in the actual version)

  • +/- should be moved to the same row as ~

That's it.

(35) By Larry Brasfield (larrybr) on 2021-07-18 19:10:37 in reply to 34 [link] [source]

Below is near-final. Rendering caveats still apply. Comments on content are welcome.

-= -= -= -= -= -= -= -=

2. Operators, Parse-Affecting Attributes

SQLite understands these operators, listed in precedence1 order (highest to lowest):

Operators2 Traits3
~ [expr]    + [expr]    - [expr] ·(·x), unary prefix
[expr] COLLATE (collation-name)4  unary postfix
|| (x·y)·z, binary
*   /   % (x·y)·z, binary
+   - (x·y)·z, binary
&  |   <<  >> (x·y)·z, binary
ESCAPE [expr]  unary postfix5
<  >  <=  >= (x·y)·z, binary
=  ==  <>  !=  IS   IS NOT
[expr] BETWEEN [expr] AND [expr]
IN  MATCH  LIKE  REGEXP  GLOB
[expr] ISNULL  [expr] NOTNULL   [expr] NOT NULL
(x·y)·z, binary
(x·y)·z, ternary 6
(x·y)·z, binary 6
 unary postfix
NOT [expr]  unary prefix
AND (x·y)·z, binary
OR (x·y)·z, binary

1. Operators shown within the same table cell share precedence.
2. "[expr]" denotes operand locations for non-binary operators.
3. "x·(y·z)", "(x·y)·z" and similar denote associativity.
4. The COLLATE clause (with its collation-name) acts as a single postfix operator.
5. This suffix can only bind to a preceding [expr] LIKE [expr] expression.
6. Each keyword in (BETWEEN IN GLOB LIKE MATCH REGEXP) may be NOT-prefixed with the same precedence and associativity.

(36) By Rico Mariani (rmariani) on 2021-07-18 20:40:09 in reply to 35 [link] [source]

Looks good to me!

(37) By Rico Mariani (rmariani) on 2021-07-18 20:53:28 in reply to 35 [link] [source]

Ok one possible suggestion. Maybe escape would clearer if we made it a binary operator. Then you could write

[like-expr] escape [expr] and put the footnote on the left arg. It's hard to call it a suffix given that it has exprs on both sides.

This is not exactly a major problem...

(39) By Larry Brasfield (larrybr) on 2021-07-18 21:30:06 in reply to 37 [link] [source]

I (and likely one or two others) will take this under advisement. I agree that it is not right, in a purist's sense.

I had also spent time worrying over that wee conundrum, but ultimately decided that its conceptually clean resolution should wait for another day (or year.)

Coming from an engineering background, I respect the adage, "the perfect is the enemy of the good." It is taken to heart by engineers who earn their keep. This is not to say there will be no resolution, which would probably be as you suggest. But I hesitate because, knowing Richard and Dan, (who wrote most and much of parse.y respectively), I am inclined to err assuming they know what they're doing unless that is clearly not the case for some little fact.

The question I must ask, (and I intend no snidery with it), is: What is the likelihood that a SQL writer, with low interest in language-lawyering, will trip over this when the syntax railroad chart immediately precedes the operator table?

(41.1) By Rico Mariani (rmariani) on 2021-07-18 22:10:29 edited from 41.0 in reply to 39 [link] [source]

There is zero chance anyone will be confused by this.

I happen to care because I'm building a compiler that needs to be compatible but normal people will see the diagram and understand with no difficulty. All these cases are completely unambiguous.

The only cases were there was real ambiguity before were the relative placement of +/-/~ and NOT.

Think on this a spot and change it maybe never.

fyi: If you're curious what compiler I'm building you can find it at http://cgsql.dev -- it's open source.

If you care you can also read the docs, chapter 3 specifically, to see how I treat this stuff. You'll note I had between wrong -- this will be fixed shortly. Make no mistake, the sqlite docs are significantly better than mine in every dimension. Still, another treatment might be interesting to you so there it is if you want it.

(38) By Rico Mariani (rmariani) on 2021-07-18 21:04:07 in reply to 35 [link] [source]

Nit: associativity was removed from unary ops (good idea it's only relevant to parser generators, not people). However + - ~ still has it. Can remove that one too.

(40) By Larry Brasfield (larrybr) on 2021-07-18 21:37:26 in reply to 38 [link] [source]

I thought (maybe too long) on that issue. I decided that, even though it would be silly for a parser (or designed language) to do the wrong thing with sequences of [+ - ~], that showing associativity explicitly could help those who might wonder if parentheses were needed. So I did not see it as superfluous.

I removed the associativity where it would answer no practical questions. For example, "What does 'NOT NOT <expression> mean?' " and "Can I write (and get away with) that?"

(42.1) By Rico Mariani (rmariani) on 2021-07-18 22:08:50 edited from 42.0 in reply to 40 [link] [source]

Removing associativity for unary ops is definitely an improvement. Most people don't understand and don't need to understand why YACC/BISON/Lemon care about associativity for unary ops. It simply doesn't apply in the context of normal expressions grammars, there is no possible ambiguity. I could make up an LALR(1) grammar where it matters but you would say that grammar is apropos of nothing and those issues don't apply to normal expressions. And you'd be right.

My best argument to remove goes like this: If the associativity was there people might actually think about it, thinking that it matters before finally convincing themselves that it doesn't. The best we can hope for by including it is that they won't waste too much time thinking about it. That's a case for deletion if I ever heard one :D

[edit] Note: ALL SQLite binary ops are left to right associative (because there is no assignment operator) so you could use that as a reason to not mention it at all except in a footnote...

(43) By Larry Brasfield (larrybr) on 2021-07-18 22:22:18 in reply to 42.1 [link] [source]

On associativity for unary operators, I agree.

My best argument to remove goes like this: If the associativity was there people might actually think about it, thinking that it matters before finally convincing themselves that it doesn't. The best we can hope for by including it is that they won't waste too much time thinking about it. That's a case for deletion if I ever heard one :D

That's a fair argument. I am tempted to say it's compelling.

Note: ALL SQLite binary ops are left to right associative (because there is no assignment operator) so you could use that as a reason to not mention it at all except in a footnote...

But that would remove a whole column from my pretty table! I will try that for a look-see. With the binary ops already lacking [expr] adjuncts, there is very little information conveyed in that column once the lack of right-associative binary operators is noted. I was trying to make that 2nd column unnecessary for the busy SQL writer, so the only surprise is the uniformity of binary operator associativity.

(44) By Rico Mariani (rmariani) on 2021-07-18 23:11:54 in reply to 43 [link] [source]

Leave it, remove it, whatever you like -- the fact that they are all explicitly marked left to right is not gonna cause confusion.

Maybe the footnote could be changed to include the text "as shown, all binary ops in SQLite are left to right." But this is hardly necessary as it's plainly obvious if anyone cares.

(45) By Rico Mariani (rmariani) on 2021-07-18 23:13:51 in reply to 43 [link] [source]

Footnote 3 does depict right to left associativity which now never occurs. :D

(46) By Larry Brasfield (larrybr) on 2021-07-18 23:42:11 in reply to 45 [link] [source]

This is probably the last take, unless a compelling argument shows it should not be:

================

2. Operators, and Parse-Affecting Attributes

SQLite understands these operators, listed in precedence1 order (highest to lowest):

Operators 2 3
~ [expr]    + [expr]    - [expr]
[expr] COLLATE (collation-name) 3
||
*   /   %
+   -
&  |   <<  >>
ESCAPE [expr] 4
<  >  <=  >=
=  ==  <>  !=  IS   IS NOT
[expr] BETWEEN5 [expr] AND [expr]
IN5  MATCH5  LIKE5  REGEXP5  GLOB5
[expr] ISNULL  [expr] NOTNULL   [expr] NOT NULL
NOT [expr]
AND
OR

1. Operators shown within the same table cell share precedence.
2. "[expr]" denotes operand locations for non-binary operators.
  Operators with no "[expr]" adjunct are binary and left associative.
3. The COLLATE clause (with its collation-name) acts as a single postfix operator.
4. The ESCAPE clause (with its escape character) acts as a single postfix operator.
  It can only bind to a preceding [expr] LIKE [expr] expression.
5. Each keyword in (BETWEEN IN GLOB LIKE MATCH REGEXP) may be prefixed
  by NOT, retaining the bare operator's precedence and associativity.

(47) By Rico Mariani (rmariani) on 2021-07-19 00:24:09 in reply to 46 [link] [source]

Ship it!

(22) By Larry Brasfield (larrybr) on 2021-07-17 05:37:35 in reply to 17.0 [link] [source]

... see if I can find more clues on NOT

NOT's existence as a unary operator is at ... Expressions, section 2, 2nd paragraph, last token. But its precedence, (contrary to what I have thought and imagined seeing), is nowhere mentioned. I suspect that I am so used to its C precedence (as '!', and high) that I imagined that without actually reading it. In SQL, as others implement it, unary NOT has precedence between that of AND and the equality-like operators.

So, briefly summarizing: There is a very clear doc improvement possibility here.

(23) By Rico Mariani (rmariani) on 2021-07-17 05:46:23 in reply to 22 [link] [source]

OK I'm seeing what you're seeing. NOT is there but its precedence is not mentioned. I think the others are also not mentioned but really +/- are going in the obvious place, so the one that really needs mentioning is ~.

(24.2) By Rico Mariani (rmariani) on 2021-07-19 21:37:33 edited from 24.1 in reply to 23 [link] [source]

[edit] the early draft diagram that was here had errors and is more likely to cause problems than anything else. I've deleted it. If you want to see it for some reason I can't fathom just ask.

Refer to the result of this discussion instead.

(48) By Larry Brasfield (larrybr) on 2021-07-19 21:47:14 in reply to 24.2 [link] [source]

Or, see this closer-to-ultimate result. There is more work remaining to get this doc coherent and more readily found and searched, so it is not "ultimate".

Thanks again for your thinking on this.

(49) By Rico Mariani (rmariani) on 2021-07-20 01:17:44 in reply to 48 [link] [source]

Good result. Always more work to do. Thanks for working on this!

(19) By Rico Mariani (rmariani) on 2021-07-17 05:26:40 in reply to 15.1 [link] [source]

I looked at section 2 again among the unary operators but it doesn't say what the relative precedence of NOT is. I don't think it even says where unary -, +, and ~ go. It's easy to see in the grammar of course. When I wanted to know I just tried experiments until I found the spot. The ~ operator was the only question mark for me. It could have been down near the binary operators rather that up with negation.

e.g.

  ~1<<7

could have meant

  ~(1<<7)

I can't find where ~ is compared to any of the binary operators.

But NOT and ~ are a long way apart in precedence.

(6) By Rico Mariani (rmariani) on 2021-07-17 00:44:52 in reply to 4 [link] [source]

Dang it... I still had it wrong. BETWEEN is not weaker than == it's the same, it's just that it's left to right and I didn't do an experiment with == on the other side. That rule is in the doc, just down there further... I just missed it.

Really, a single illustration for the precedence would be very helpful. That's all I'm saying.

(7) By Rico Mariani (rmariani) on 2021-07-17 00:47:42 in reply to 4 [link] [source]

I really should have gone to the grammar spec in the first place... This comes from not being used to having easy access to the source of things I use... but still it would be nice if it was in the doc.