SQLite Forum

Request for comment: New ANALYZE syntax
Login

Request for comment: New ANALYZE syntax

(1) By Richard Hipp (drh) on 2020-03-18 13:03:18 [link] [source]

The ANALYZE command is not standard SQL (as far as I know) but it is widely implemented. PostgreSQL has an ANALYZE command with syntax like this:

   ANALYZE [VERBOSE] [table-name [( column [, ...] )]]

On MySQL the syntax is:

   ANALYZE [NO_WRITE_TO_BINLOG | LOCAL] table [, ...]

Notice how both PostgreSQL and MySQL have the ability to specify options to ANALYZE that alter its behavior. The ANALYZE syntax to SQLite is simpler:

   ANALYZE [table-or-index-or-collation-name]

SQLite has no modifiers. But, I have the need to optionally modify the behavior of ANALYZE. In particular, I'm working on logic now that will allow the application to control the level of detail of the analysis. In other words, the application can request a fast but superficial ANALYZE, or a slower but more detailed ANALYZE, thus allowing the application to trade off precision and runtime according to its needs. The question is, what syntax would be best for this?

The pattern established by PostgreSQL and MySQL is to allow one of a small subset of keywords in between the "ANALYZE" keyword and the name of the object to be analyzed. With PostgreSQL you can say "VERBOSE". With MySQL you can say "NO_WRITE_TO_BINLOG" or "LOCAL". The immediate need in SQLite is to allow an option with an integer argument. Something like this, perhaps:

   ANALYZE limit=5000 main.sometable;

But I want the syntax to be general so that I can easily add new parameters and options in the future. For example:

   ANALYZE FAST main.sometable;
   ANALYZE STAT1_ONLY main.sometable;
   ANALYZE samples=25, FAST main.sometable;

One problem is that it is difficult to express that syntax in a conflict-free LALR(1) grammar. Keep in mind that the name at the end is optional. So the last line above might be:

   ANALYZE samples=25, FAST;

Does this last command mean that you want to analyze all tables using the "samples=25" and "fast" options, or does it mean that you want to analyze the table named "fast" with the "samples=25" option?

One way to resolve the ambiguity would be to put the options at the end, separated by a keyword. Perhaps something like:

   ANALYZE main.sometable USING samples=25, fast;
   ANALYZE USING stat1_only;

Or, we could put the options in directly after the "ANALYZE" keyword but require them to be inside parentheses:

   ANALYZE (samples=25, fast) main.sometable;
   ANALYZE (stat1_only);

Question:

So what do y'all think? What will be the best way to extend the syntax of ANALYZE so that it can support a (possibly growing) list of options, some of which include arguments?

(2) By Stephan Beal (stephan) on 2020-03-18 13:16:26 in reply to 1 [link] [source]

Though it's a bit verbose, i like:

ANALYZE USING (...options)

To me that currently reads most clearly (noting that i have not yet had my coffee), though "using" is admittedly superfluous.

(3) By TripeHound on 2020-03-18 13:21:59 in reply to 1 [link] [source]

Without being an expert on either SQL or LALR grammars, my thought while reading the post (and before I got to it) was essentially your last example: enclosing them in parentheses. The only slight drawback, perhaps, would be still needing to use (...) when there's no name at the end... in which case, the USING version may be preferable.

(4.1) By Chris (crustyoz) on 2020-03-18 13:24:33 edited from 4.0 in reply to 1 [link] [source]

Would you consider adding another command (e.g. ANALYZE_SETTINGS) which configures ANALYZE? Might be some confusion with name since ANALYZE is a bit vague as a function label.

The default conditions, whatever they might be, can be modified by the new command and the modifications remain in effect until the next use of the configure command or end of session. Then ANALYZE command syntax remains unchanged.

(8) By David Raymond (dvdraymond) on 2020-03-18 14:39:17 in reply to 4.1 [link] [source]

So basically an analyze_settings pragma?

So you could do either...

analyze (samples=25, fast) main.sometable;

...to make it explicit for that one call, or do...

pragma analyze_settings = '(samples=25, fast)';
analyze main.sometable; --uses pragma settings if set
analyze (samples=1000) main.sometable; -- explicit use overrides pragma

...to make it stick for all future non-explicit analyze calls in that connection.

(By the way, my vote is for the options in parenthesis like in Postgres)

(5) By Mark Lawrence (mark) on 2020-03-18 13:37:01 in reply to 1 [link] [source]

How about overloading WITH:

WITH samples=25,fast=1 ANALYZE main.sometable;

The "=" should distinguish a setting WITH from a statement WITH.

(6) By David Jones (vman59) on 2020-03-18 13:58:42 in reply to 1 [source]

The parenthesis format looks more to the user like the options are additional arguments/modifiers to the analyze function, the USING clause looks like you could be doing the analysis with something else.

(7) By anonymous on 2020-03-18 14:36:57 in reply to 1 [link] [source]

Having reviewed the current grammar and the options proposed I am thinking I like the keyword separation with options as the end.

I am reading this as when using ANALYZE or ANALYZE main.sometable this is implying a full and complete ANALYZE.

Therefore I propose the following after perusing the current keyword list in SQLITE3:

ANALYZE main.sometable RESTRICT samples=25, fast;

ANALYZE RESTRICT stat1_only;

(9) By Richard Hipp (drh) on 2020-03-18 14:47:50 in reply to 7 [link] [source]

Also consider "WITH":

  ANALYZE main.sometable WITH samples=25, fast;
  ANALYZE WITH stat1_only;

"WITH" was actually my initial choice. I changed to "USING" later. "RESTRICT" is also an option, I suppose - it seems a little more SQL-ish.

(11) By TripeHound on 2020-03-18 15:15:08 in reply to 9 [link] [source]

A "danger" with RESTRICT is that it suggests, well, restriction... if future parameters/options are unknown, you might end up with something like ANALYZE main.table RESTRICT VERBOSE or ANALYZE main.table RESTRICT ALL_LEVELS where the option widens rather than restricts the analysis. WITH or USING don't (to me) carry that connotation.

(14) By anonymous on 2020-03-20 13:31:19 in reply to 9 [link] [source]

Instead of overloading WITH consider VIA (as in a journey to a destination via places of note)

(10.1) By Simon Slavin (slavin) on 2020-03-18 15:04:21 edited from 10.0 in reply to 1 [link] [source]

Parameters are separated by spaces.

If there's just one parameter it's an entity name. Backward compatibility.

If there are two or more parameters then the last one is an entity name and those before it are options. If you want ANALYZE to work on all entities, specify "*". Or do I mean '*.*' ?

(12) By TripeHound on 2020-03-18 15:23:21 in reply to 10.1 [link] [source]

My first instinct is a "loose" syntax like this may lead to the sort of confusion where CREATE TRIGGER AFTER INSERT... creates a "before" trigger called AFTER...

(13.1) By Keith Medcalf (kmedcalf) on 2020-03-18 17:04:46 edited from 13.0 in reply to 1 [link] [source]

I think putting the options in brackets is the clearest and is most similar to the options used in for example pragma's and virtual table declarations. Could a token-separator (space) be optional before the options as in to allow:

analyze(fast) main.*;
analyze(fast, samples=25);

as well as:

analyze (fast) main.*;
analyze (fast, samples=25);

which would make the options look (perhaps more) like an appendage/argument to analyze?

(15) By anonymous on 2020-03-22 05:14:10 in reply to 1 [link] [source]

I think "USING" seems good. (Maybe the options should be in parentheses after the "USING" keyword; I don't know.) I don't like "RESTRICT" or "VIA"; it doesn't look like good to me (and, as someone else has mentioned, options are not necessarily restrictions anyways).

Using a reserved word here (such as "USING"; not all keywords are reserved, although "USING" is; I tried it just now to check) is helpful since it reduces ambiguity.

(16) By anonymous on 2020-03-23 03:11:43 in reply to 1 [link] [source]

I'd try and stick within existing keywords, maybe adding BRIEF and/or VERILY, but there is already FULL to mean verily.

Use PRAGMA as the model for any name=value options

ANALYZE [BRIEF | FULL] schema.table WITH/or/USING name, name=value,... LIMIT expr OFFSET expr

FULL or ALL could be aliased, but FULL seems to make more sense, and ALL might be the all options option, instead of listing the name value pairs. In that case NOTHING might be the opposite of ALL when doing a full scan without any extra stats.

You already have LIMIT, so that seems reasonable for setting sample size, including a starting OFFSET (if that seems like a sane thing to do).

WITH or USING, should almost be interchangeable, allow either? Easier on the brain and might save some reference lookups. VALUES would work there too, but I don't think VALUES ever uses name/value pairs like the PRAGMA syntax.

If BRIEF/FAST is added, the keyword ADD might read well for adding pragma style options to an analysis. Maybe EXCLUDE (or IGNORE) to turn off some outliers from a FULL/VERILY pass.

When no BRIEF or FULL is mentioned it could allow for a middle ground 'normal-ish' option, but probably safer to just mean current behaviour.

Have good

(17) By jose isaias cabrera (jicman) on 2020-03-25 22:59:18 in reply to 1 [link] [source]

I hope that there is the final option, just ANALYZE; should also work like it does now. :-)

(18) By Peter da Silva (resuna) on 2020-03-26 15:24:13 in reply to 1 [link] [source]

ANALYZE samples=25, FAST;

Does this last command mean that you want to analyze all tables using the "samples=25" and "fast" options, or does it mean that you want to analyze the table named "fast" with the "samples=25" option?

The former. The latter would be "ANALYZE samples=25 FAST;"