SQLite Forum

fts5 rowid peculiarity in UPDATE FROM

fts5 rowid peculiarity in UPDATE FROM

(1) By Max (Maxulite) on 2021-06-29 10:41:27 [source]

Sqlite version 3.36.0

For fts4 the following query works

update FtsIndex4 Set text="foo bar" from (select 123432545 as modified ) where docid=modified

but for fts5 the equivalent one (docid is replaced by rowid)

update FtsIndex5 Set text="foo bar" from (select 123432545 as modified ) where rowid=modified

gives an error (no such column rowid), but with the table name prefix "...where FtsIndex5.rowid=modified..." it works. Not a big deal, but might confuse working with drop-in replacement in mind when migrating from fts3/fts4 to fts5


(2) By Dan Kennedy (dan) on 2021-06-29 17:09:09 in reply to 1 [link] [source]

Internally, that statement executes:

SELECT ... FROM ftsindex5, (SELECT 1234 AS modified) WHERE rowid=modified

And it turns out that you can't use an unqualified "rowid" in any SELECT statement with more than one table in the join or you get the error.

This is an anachronism I think. There was a time when all tables/views/sub-selects in SQLite had an accessible rowid - even (SELECT 1234). So this case couldn't come up then. But now it can.


(3.1) By Max (Maxulite) on 2021-06-30 11:57:56 edited from 3.0 in reply to 2 [link] [source]

Thanks, I understand now. The situation is unusual in the sense that rowid is a special citizen for obvious reasons and anyone familiar with Sqlite is used to this. In other words it's always there (with exception of WITHOUT ROWID tables), but you don't deal with it unless you want to. But when using fts5, one is enforced to use it instead of a general (at least from the naming point of view) column docid. So the special life of this citizen comes to light unconditionally.

I tried to search in the timeline and it looks like not using an alias like docid was a very early design decision, the comments some time contained docid in descriptions and then were renamed overnight. Just for historical curiosity I wonder what was the objective of not using docid alias from the start?

(4) By Dan Kennedy (dan) on 2021-07-01 18:29:33 in reply to 3.1 [link] [source]

Just for historical curiosity I wonder what was the objective of not using docid alias from the start?

Just didn't see any point to it at the time I suppose. And you have to figure out what to do about this:

    INSERT INTO fts3(rowid, docid, cols) VALUES(1, 2, ...);

The problem with SQLite not allowing "rowid" in the join query, even though the reference is not ambiguous, is fixed here, btw:


So the UPDATE FROM on the fts5 table should now work with trunk. Or 3.37.0, when it is released.


(5) By Max (Maxulite) on 2021-07-02 10:08:29 in reply to 4 [link] [source]

Great, the citizen is now less special again )

I checked with my data, everything works as expected, including UPDATE FROM and WITHOUT ROWID tables.

There's one thing I noticed though. Sometimes one can introduce his or here own rowid (plenty of ways to do this, cte, forced rowid in "without rowid" tables, etc). The specifics here is that to be ambiguous in two senses, the rowids should be by the same kind. So joining two tables with "user rowid" gives "ambiguous column name: rowid" error and joining two tables with "sqlite rowid" gives "no such column: rowid", but mixing them (one user, one sqlite) gives no error. I don't know whether this mixing was possible before this fix, but if not then this might be a reason for hiding rowids in joins.

The example of this when sqlite is ok with the mixed ambiguity (testtable is a general rowid table and according to my observation the rowid in ON clause is interpreted as coming from ctesource)

with ctesource(rowid, id) as
     select 12324, 1
select * from testtable, ctesource on rowid=ctesource.id

I'm not sure this is a big deal, just to take an additional look here.

(6) By Dan Kennedy (dan) on 2021-07-02 11:00:46 in reply to 5 [link] [source]

It is as you say.

For any "rowid" reference, SQLite first searches for an explicit user-supplied rowid column. If it finds one, all good. If it finds more than one, the reference is ambiguous. If it finds zero such columns, it searches for implicit rowid columns and handles them the results the same way - one match is good, more than one is an error.

So if you mix rowid types in a single query, "rowid" will always match the user specified column.

I'm not sure you would necessarily design things this way if you were starting over, but the thing to do now is to avoid inadvertently introducing changes that break existing queries.