Possible Bug - View behaves differently to Table in SELECT

(1) By David Pratten (dpratten) on 2020-12-13 21:40:42 [link] [source]

Hi,

I'm puzzled by this behaviour.

Select from three views:

select * FROM   [_RPMItems_] [e]
    LEFT JOIN [_networkNodes_] [n] ON [n].[id] = [e].[id]
    LEFT JOIN [_elementHumanReadableNames_] eHRN on eHRN.id = e.id

Results in

[1] [SQLITE_ERROR] SQL error or missing database (ON clause references tables to its right)

Whereas replacing one of the views with a materialised version of self

select * FROM   [_RPMItems_] [e]
    LEFT JOIN [_networkNodes_] [n] ON [n].[id] = [e].[id]
    LEFT JOIN [MaTelementHumanReadableNamesMaT] eHRN on eHRN.id = e.id

does not generate an error.

Under what conditions should a view behave differently to a materialised version of itself in SELECT?

Sample .db is here

All the best,

David

(2) By David Pratten (dpratten) on 2020-12-19 23:18:47 in reply to 1 [link] [source]

Hi SQLite team,

Is this view vs materialised view behavior by design?

David

(3) By Richard Hipp (drh) on 2020-12-20 16:13:03 in reply to 1 [link] [source]

I don't (yet) know why your query is not working. But thanks for sharing your sample database. I don't think I've ever seen a schema that is quite that intense, with deeply nested views that make extensive use of recursive common table expressions and window functions.

This one query:

EXPLAIN QUERY PLAN
SELECT * FROM _viewNetworkNodesPredSuccCompletness_;

Results in 15179 lines of query plan explanation output! The corresponding prepared statement contains 314,248 byte-code instructions and uses 11,231 distinct b-tree cursors.

For years, I've been giving talks that encourage developers to embrace the power of the SQL query language - to move beyond simple SELECT and INSERT statements mixed with a pile of procedural code and instead write complex queries that give the answer directly. I often use some of the queries generated by Fossil as examples, pointing out how a few lines (or a few dozen lines) of SQL can replace thousands and thousands of lines of procedural code. Doing so reduces the number of bugs (which are roughly proportional to the number of lines of code) and also often result in faster solutions as well.

But you, sir, have taken this idea to a whole new level.

I don't know if we'll be able to fix your problem or not. But I can promise that we will keep your schema around to use as a stress-test for SQLite's parser and code generator.

(6) By David Pratten (dpratten) on 2020-12-22 21:57:59 in reply to 3 [link] [source]

Hi Richard,

Yes, this is precisely my use-case. SQLite is doing the heavy lifting building and analysing trees and tracing dependencies through a directed acyclic graph.

The query flow graph is here

(This was generated as a graphviz.dot by the view dotViewDependencies interrogating sqlite_master)

David

(4.1) By Dan Kennedy (dan) on 2020-12-22 11:01:32 edited from 4.0 in reply to 1 [link] [source]

Thanks for reporting this one - and for bumping it since it looks like it was overlooked at first. Should now be fixed on trunk.

To get this working, we had to fix 1 longstanding bug to do with LEFT JOIN and queries that use more than 65536 database cursors, 1 bug in unreleased code to do with joining UNION ALL subqueries with LEFT JOIN subqueries, and 1 bug in a general purpose routine used for iterating through expression nodes. It's not clear whether that last one could actually be exploited in released code, but it's quite possible.

I really can't remember the last time a single query found 2 bugs, let alone 3. So you certainly have one of the most complicated SQLite schemas out there - perhaps the most complicated overall!

Dan.

(5) By David Pratten (dpratten) on 2020-12-22 21:41:11 in reply to 4.1 [link] [source]

Thanks Dan and Richard, likely release timing? Will these changes make 3.35? David

(7.1) By Warren Young (wyoung) on 2020-12-22 21:59:42 edited from 7.0 in reply to 5 [link] [source]

It's scheduled for 2021-03-30.

(8) By David Pratten (dpratten) on 2020-12-23 00:55:11 in reply to 7.1 [link] [source]

Thanks, looking forward to it.

Really happy to have contributed.

David

(16) By David Pratten (dpratten) on 2021-03-18 09:42:24 in reply to 7.1 [link] [source]

Hi Warren,

@wyoung I can't see this one in the release notes for 3.35.x?

David

(12) By David Pratten (dpratten) on 2020-12-26 01:18:41 in reply to 4.1 [link] [source]

Hi Dan,

select * from _elementHumanReadableNames_

in download here database results in

SQL error or missing database (too many references to "model": max 65535)

Is this a designed-in constraint or a bug?

Thanks

David

(13) By David Pratten (dpratten) on 2021-01-04 01:17:13 in reply to 12 [link] [source]

Hi Team,

Is the issue reported above a designed-in constraint or a bug?

David

(14) By David Raymond (dvdraymond) on 2021-01-04 16:00:03 in reply to 13 [source]

Well, searching in the amalgamation that error message only shows up in one place, in the selectExpander function. So my complete amateur guess is that it looks like while the definition of the view is fine, when you expand all the view definitions out, the "model" table gets referenced more than 65,535 times, which would probably overflow a specific counter somewhere. Since they're explicitly checking for it I'd consider it a "designed-in constraint", but again, this is just an amateur guess.

/*
** This routine is a Walker callback for "expanding" a SELECT statement.
** "Expanding" means to do the following:
**
**    (1)  Make sure VDBE cursor numbers have been assigned to every
**         element of the FROM clause.
**
**    (2)  Fill in the pTabList->a[].pTab fields in the SrcList that
**         defines FROM clause.  When views appear in the FROM clause,
**         fill pTabList->a[].pSelect with a copy of the SELECT statement
**         that implements the view.  A copy is made of the view's SELECT
**         statement so that we can freely modify or delete that statement
**         without worrying about messing up the persistent representation
**         of the view.
**
**    (3)  Add terms to the WHERE clause to accommodate the NATURAL keyword
**         on joins and the ON and USING clause of joins.
**
**    (4)  Scan the list of columns in the result set (pEList) looking
**         for instances of the "*" operator or the TABLE.* operator.
**         If found, expand each "*" to be every column in every table
**         and TABLE.* to be every column in TABLE.
**
*/
static int selectExpander(Walker *pWalker, Select *p)
...
    Table *pTab;
...
    }else{
      /* An ordinary table or view name in the FROM clause */
      assert( pFrom->pTab==0 );
      pFrom->pTab = pTab = sqlite3LocateTableItem(pParse, 0, pFrom);
      if( pTab==0 ) return WRC_Abort;
      if( pTab->nTabRef>=0xffff ){
        sqlite3ErrorMsg(pParse, "too many references to \"%s\": max 65535",
           pTab->zName);
        pFrom->pTab = 0;
        return WRC_Abort;
      }

...

/*
** The schema for each SQL table and view is represented in memory
** by an instance of the following structure.
*/
struct Table {
...
  u32 nTabRef;         /* Number of pointers to this Table */
...

(15) By David Pratten (dpratten) on 2021-01-06 03:23:33 in reply to 14 [link] [source]

Thanks David,

I'll continue materialising the views rather than SELECTing from them directly.

David

(9) By Wout Mertens (wmertens) on 2020-12-23 05:55:19 in reply to 1 [link] [source]

That's quite a database! Thanks for the interesting schema.

Unrelated question, I've never seen the [] surrounding identifiers before, and I can't find them documented in the SQLite docs either. From experimenting, they seem the same as using double-quotes?

I see you mix their use, in your examples you use both [e].[id] and e.id. Is there a particular reason you use it?

Thanks,

Wout.

(10) By Warren Young (wyoung) on 2020-12-23 06:22:36 in reply to 9 [link] [source]

I can't find them documented in the SQLite docs either.

Here.

(11) By David Pratten (dpratten) on 2020-12-26 01:09:08 in reply to 9 [link] [source]

Hi Wout,

Nothing special here, the mixed style is a result of not having an automated style checker.

David