SQLite Forum

Request for comment: New ANALYZE syntax
Login
The ANALYZE command is not standard SQL (as far as I know) but it
is widely implemented.  PostgreSQL has an ANALYZE command with
syntax like this:

~~~
   ANALYZE [VERBOSE] [table-name [( column [, ...] )]]
~~~

On MySQL the syntax is:

~~~
   ANALYZE [NO_WRITE_TO_BINLOG | LOCAL] table [, ...]
~~~

Notice how both PostgreSQL and MySQL have the ability to specify
options to ANALYZE that alter its behavior.
The ANALYZE syntax to SQLite is simpler:

~~~
   ANALYZE [table-or-index-or-collation-name]
~~~

SQLite has no modifiers.  But, I have the need to optionally modify
the behavior of ANALYZE.  In particular, I'm working on logic now that
will allow the application to control the level of detail of the
analysis.  In other words, the application can request a fast but
superficial ANALYZE, or a slower but more detailed ANALYZE, thus
allowing the application to trade off precision and runtime according
to its needs.  The question is, what syntax would be best for
this?

The pattern established by PostgreSQL and MySQL is to allow one of
a small subset of keywords in between the "ANALYZE" keyword and the
name of the object to be analyzed.  With PostgreSQL you can say
"VERBOSE".  With MySQL you can say "NO\_WRITE\_TO\_BINLOG" or "LOCAL".
The immediate need in SQLite is to allow an option
with an integer argument. Something like this, perhaps:

~~~
   ANALYZE limit=5000 main.sometable;
~~~

But I want the syntax to be general so that I can easily add new
parameters and options in the future.  For example:

~~~
   ANALYZE FAST main.sometable;
   ANALYZE STAT1_ONLY main.sometable;
   ANALYZE samples=25, FAST main.sometable;
~~~

One problem is that it is difficult to express that syntax in
a conflict-free LALR(1) grammar.  Keep in mind that the name at the
end is optional.  So the last line above might be:

~~~
   ANALYZE samples=25, FAST;
~~~

Does this last command mean that you want to analyze all tables using
the "samples=25" and "fast" options, or does it mean that you want
to analyze the table named "fast" with the "samples=25" option?

One way to resolve the ambiguity would be to put the options at
the end, separated by a keyword.  Perhaps something like:

~~~
   ANALYZE main.sometable USING samples=25, fast;
   ANALYZE USING stat1_only;
~~~

Or, we could put the options in directly after the "ANALYZE" keyword
but require them to be inside parentheses:

~~~
   ANALYZE (samples=25, fast) main.sometable;
   ANALYZE (stat1_only);
~~~

# Question:

So what do y'all think?  What will be the best way to extend the
syntax of ANALYZE so that it can support a (possibly growing) list of options,
some of which include arguments?