bug: sqlite3_changes returns incorrect value for UPDATE..RETURNING when table was dropped/created a second time on connection

(1) By Mike Bayer (zzzeek) on 2022-06-02 12:54:07 [source]

This is posted at the recommendation of Python sqlite3 maintainers from https://github.com/python/cpython/issues/93421 who have stated this bug is in SQLite's C API; tested on SQLite 3.36.0 on Linux.

if a table is dropped and recreated on the same SQLite connection, sqlite3_changes (accessed below via Python's .rowcount accessor) returns the wrong value for UPDATE..RETURNING.

the test case below is in terms of the Python sqlite3 API, as I don't have a C environment readily available; per Python sqlite3 devs, the .rowcount attribute calls sqlite3_changes directly.

import os
import sqlite3



def go():
    """function creates a new table, runs INSERT/UPDATE, drops table,
    commits connection.

    """

    # create table
    cursor = conn.cursor()
    cursor.execute(
        """CREATE TABLE some_table (
        id INTEGER NOT NULL,
        value VARCHAR(40) NOT NULL,
        PRIMARY KEY (id)
    )
    """
    )
    cursor.close()
    conn.commit()

    # run operation
    cursor = conn.cursor()
    cursor.execute(
        "INSERT INTO some_table (value) VALUES ('v1')"
    )
    ident = cursor.lastrowid

    cursor.execute(
        "UPDATE some_table SET value='v2' "
        "WHERE id=? RETURNING id",
        (ident,),
    )
    new_ident = cursor.fetchone()[0]
    assert ident == new_ident
    assert cursor.rowcount == 1, cursor.rowcount
    cursor.close()

    # drop table
    cursor = conn.cursor()
    cursor.execute("DROP TABLE some_table")
    cursor.close()

    conn.commit()

if os.path.exists("file.db"):
    os.unlink("file.db")

# passes
conn = sqlite3.connect("file.db")
go()

# run again w/ new connection (same DB), passes
conn = sqlite3.connect("file.db")
go()

print("FAILURE NOW OCCURS")
# run again w/ same connection, fails
go()

running the test indicates on the third run, which is where the same operation was run a second time on a single connection, the assertion fails:

FAILURE NOW OCCURS
Traceback (most recent call last):
  File "/home/classic/dev/sqlalchemy/test3.py", line 62, in <module>
    go()
  File "/home/classic/dev/sqlalchemy/test3.py", line 39, in go
    assert cursor.rowcount == 1, cursor.rowcount
AssertionError: 0

(2) By Dan Kennedy (dan) on 2022-06-03 21:14:03 in reply to 1 [link] [source]

Executing that sequence of commands a few times in the shell doesn't produce any surprises.

Why is it that the "assert ident == new_ident" assert is not failing? What value is cursor.rowcount being set to?

Dan.

(3) By Mike Bayer (zzzeek) on 2022-06-04 15:48:56 in reply to 2 [link] [source]

Executing that sequence of commands a few times in the shell doesn't produce any surprises.

agree, the closest I can get to the same sequence of commands in the shell is (yes, the BEGIN after CREATE is closest to what pysqlite does):

CREATE TABLE some_table (
        id INTEGER NOT NULL,
        value VARCHAR(40) NOT NULL,
        PRIMARY KEY (id)
);
BEGIN;
INSERT INTO some_table (value) VALUES ('v1');
SELECT last_insert_rowid();

UPDATE some_table SET value='v2' WHERE id=1 RETURNING id;
SELECT changes();
DROP TABLE some_table;
COMMIT;

which doesn't have any problem being run repeatedly in the same shell, returns "1" for all three row-returning queries.

if we can find a way that this bounces towards the pysqlite driver again, that's fine. pysqlite maintainers claimed it's on the SQLite side.

Why is it that the "assert ident == new_ident" assert is not failing?

ident is being set to the Python-side cursor.lastrowid attribute, which is equivalent to last_insert_rowid(). that value does not appear to be malfunctioning here in this case, but the test shows that one row was in fact matched by the UPDATE statement.

What value is cursor.rowcount being set to?

on the second run it's set to zero when it should be one, as is the case in the above SQL sequence run in the shell.

(4) By Keith Medcalf (kmedcalf) on 2022-06-04 17:19:15 in reply to 3 [link] [source]

In your original code, if you change:

    assert cursor.rowcount == 1, cursor.rowcount

and replace it with

    rc = cursor.execute('select changes()').fetchone()[0]
    assert rc == 1, 'changes is not one'

it works properly. All I can assume from this is that pysqlite2 is broken somehow.

(5) By Mike Bayer (zzzeek) on 2022-06-04 21:13:46 in reply to 4 [link] [source]

okey doke, I'll try to get that to them

(6) By Keith Medcalf (kmedcalf) on 2022-06-04 22:13:17 in reply to 1 [link] [source]

per Python sqlite3 devs, the .rowcount attribute calls sqlite3_changes directly.

You need a "Python sqlite3 dev" who knows how the shim works and can perhaps even read C code. The .rowcount attribute does not call sqlite3_changes and return the result -- if that is what it did, then there would be no excuse for it not working.

.rowcount is an attribute of the cursor (strange stupidity that, sqlite3_changes is associated with a connection, not a statement). If it is supposed to reflect the value of sqlite3_changes, then there would be no need to initialize it, reset it, or otherwise do anything other than call sqlite3_changes to get the data.

I cannot fathom what it is supposed to be doing. Then again, the whole pysqlite2 wrapper is ill-conceived. (especially in consideration that the default init parameters (connect, isolation_level) is set to the stupidest default conceivable and needs to be set everytime one would use that shim in order to make it not behave by often-wrong "automagic".

(9) By anonymous on 2022-06-05 19:40:07 in reply to 6 [link] [source]

Hi, Keith

The .rowcount attribute does not call sqlite3_changes and return the result -- if that is what it did, then there would be no excuse for it not working.

The .rowcount attribute does not call any SQLite interface; is simply looks up a member variable of the pysqlite_Cursor object struct. The value of .rowcount is set from the Cursor.execute*() methods, more specifically in the "execute loop" of _pysqlite_query_execute(), a helper function for the Cursor.execute() and Cursor.executemany() methods. This helper function first sets .rowcount to zero before entering the execute loop, and then the changes are accumulated for each iteration of the loop. For a Cursor.execute() call, the value of .rowcount is in practice set directly from a sqlite3_changes() call; no "automagic":

https://github.com/python/cpython/blob/4082c8e298a244edf6771839334372a47ece721d/Modules/_sqlite/cursor.c#L947-L951

.rowcount is an attribute of the cursor (strange stupidity that, sqlite3_changes is associated with a connection, not a statement). If it is supposed to reflect the value of sqlite3_changes, then there would be no need to initialize it, reset it, or otherwise do anything other than call sqlite3_changes to get the data.

I cannot fathom what it is supposed to be doing.

.rowcount is not supposed to reflect the valueof sqlite3_changes. As with the Cursor.execute*() methods, .rowcount is required by the DB API (PEP 249), which the sqlite3 Python stdlib extension module tries to adhere to. Quoting from the PEP, regarding .rowcount:

This read-only attribute specifies the number of rows that the last .execute*() produced (for DQL statements like SELECT) or affected (for DML statements like UPDATE or INSERT). The attribute is -1 in case no .execute*() has been performed on the cursor or the rowcount of the last operation is cannot be determined by the interface.

We could make .rowcount call sqlite3_changes() lazily at attribute read time. Changing "the shim" to do this will solve Mike's problem, but that will break all code that expect.rowcount to return the number of rows changed by Cursor.executemany(), so unfortunately we cannot do that.

However, we can work around this issue, for example using a couple of sqlite3_total_changes() calls; that should keep both the DP API happy and maintain backwards compatibility.

I made an SQLite C API reproducer for Mike's code; you can find it in this gist. As you might notice, it manages to reproduce the condition where sqlite3_changes() return 0. I don't know if this is to be expected by the SQLite C API, if it is a bug, or if it is a misuse of the SQLite C API. It really does not matter, as we can work around it.

Now commentating on the off-topic section of your post:

Then again, the whole pysqlite2 wrapper is ill-conceived. (especially in consideration that the default init parameters (connect, isolation_level) is set to the stupidest default conceivable and needs to be set everytime one would use that shim in order to make it not behave by often-wrong "automagic".

The isolation_level stuff is ... a pain. But it is hard to do anything with it, without breaking existing code. I would love to get rid of it today, but it will take several releases to get that straight; it is not impossible, but it will take time.

Note that the Python stdlib sqlite3 extension module is meant to adhere to the DP API specified by PEP 249. Like it nor not, it is not intended to be an SQLite C API shim. It is not intended to be APSW either. It intends to provide a PEP 249 compliant API.

Anyway, thanks for making SQLite; it is a wonderful library, and, since now are deep in the OT section, I must say I really appreciate Dr. Hipp's words in the preamble of the SQLite source code files:

 **    May you do good and not evil.
 **    May you find forgiveness for yourself and forgive others.
 **    May you share freely, never taking more than you give.

Greetings from a "Python sqlite3 dev" who knows how the shim works (but did not write it himself) and can perhaps even read C code.

Erlend

(11) By anonymous on 2022-06-05 20:30:42 in reply to 9 [link] [source]

Expanding on the SQLite C API repro code in the gist: change Mike's insert and update statements to insert and update multiple rows. Interesting stuff happens to sqlite3_changes.

Two-row version:

$ ./test
Compiled with SQLite 3.26.0
Executing with SQLite 3.36.0
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' 
- step, changes=0
executing 'BEGIN' 
- step, changes=0
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1')' 
- step, changes=2
executing 'UPDATE some_table SET value='v2' WHERE id=1 RETURNING id' 
- step, changes=2
- step, changes=1
executing 'DROP TABLE some_table' 
- step, changes=1
executing 'COMMIT' 
- step, changes=1
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' (cached)
- step, changes=1
executing 'BEGIN' (cached)
- step, changes=1
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1')' (cached)
- step, changes=2
executing 'UPDATE some_table SET value='v2' WHERE id=1 RETURNING id' (cached)
- step, changes=0
- step, changes=1
executing 'DROP TABLE some_table' (cached)
- step, changes=1
executing 'COMMIT' (cached)
- step, changes=1

Three-row version: ``` $ ./test Compiled with SQLite 3.26.0 Executing with SQLite 3.36.0 executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))'

step, changes=0 executing 'BEGIN'
step, changes=0 executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1'), (3, 'v1')'
step, changes=3 executing 'UPDATE some_table SET value='v2' WHERE id=1 RETURNING id'
step, changes=3
step, changes=1 executing 'DROP TABLE some_table'
step, changes=1 executing 'COMMIT'
step, changes=1 executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' (cached)
step, changes=1 executing 'BEGIN' (cached)
step, changes=1 executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1'), (3, 'v1')' (cached)
step, changes=3 executing 'UPDATE some_table SET value='v2' WHERE id=1 RETURNING id' (cached)
step, changes=0
step, changes=1 executing 'DROP TABLE some_table' (cached)
step, changes=1 executing 'COMMIT' (cached)
step, changes=1 ```

Three rows, and a loosened update query:

$ ./test 
Compiled with SQLite 3.26.0
Executing with SQLite 3.36.0
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' 
- step, changes=0
executing 'BEGIN' 
- step, changes=0
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1'), (3, 'v1')' 
- step, changes=3
executing 'UPDATE some_table SET value='v2' WHERE id<3 RETURNING id' 
- step, changes=3
- step, changes=3
- step, changes=2
executing 'DROP TABLE some_table' 
- step, changes=2
executing 'COMMIT' 
- step, changes=2
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' (cached)
- step, changes=2
executing 'BEGIN' (cached)
- step, changes=2
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1'), (3, 'v1')' (cached)
- step, changes=3
executing 'UPDATE some_table SET value='v2' WHERE id<3 RETURNING id' (cached)
- step, changes=0
- step, changes=0
- step, changes=2
executing 'DROP TABLE some_table' (cached)
- step, changes=2
executing 'COMMIT' (cached)
- step, changes=2

(12) By anonymous on 2022-06-06 07:10:48 in reply to 11 [link] [source]

After re-reading the sqlite3_changes docs, it looks like the behaviour of sqlite3_changes is as expected on the second run (re-using the prepared statement), but deviates from what's documented on the first run. On the first run, the value from the previous statement is returned after the first sqlite3_step of the UPDATE statement. On the second run, sqlite3_changes returns zero (documented behaviour) after the first sqlite3_step of the UPDATE statement.

(10) By Mike Bayer (zzzeek) on 2022-06-05 20:20:15 in reply to 6 [link] [source]

You need a "Python sqlite3 dev" who knows how the shim works and can perhaps even read C code. The .rowcount attribute does not call sqlite3_changes and return the result -- if that is what it did, then there would be no excuse for it not working.

from a sqlite3 maintainer who maintains the C code directly, their exact words at https://github.com/python/cpython/issues/93421#issuecomment-1144600340 were "we simply use the function sqlite3_changes to set rowcount". But as this looked to be the result of something being cached / not refreshed I had a feeling it would come back to them.

They have since created a reproducer illustrating their use of the C API and how it gets the value 0 back at https://gist.github.com/erlend-aasland/5e48b920d882e04f911fe7871fd5ccd9 and as expected there is some kind of statement cache in use though I haven't looked closely enough to see why this causes the issue. It looks like there is some discussion over the best approach to take as it seems like they have to make some choices how best to resolve.

.rowcount is an attribute of the cursor (strange stupidity that, sqlite3_changes is associated with a connection, not a statement).

the .rowcount attribute is part of the pep-249 DBAPI we're following hence it's associated with a "cursor", which has to exist even for DBAPIs that don't internally have any concept of a "cursor".

I cannot fathom what it is supposed to be doing. Then again, the whole pysqlite2 wrapper is ill-conceived. (especially in consideration that the default init parameters (connect, isolation_level) is set to the stupidest default conceivable and needs to be set everytime one would use that shim in order to make it not behave by often-wrong "automagic".

I think you might be referring to the way they implemented autobegin, which is in fact reported as a bug by me about 12 years ago. still opened. pep-249 requires that connections be delivered by default in non-autocommit mode, so this is what they were trying to do while sidestepping DB locking issues to some extent. In SQLAlchemy we provide a recipe on how to resolve this issue however we have not made it default on our end either, because second-guessing the drivers means we'll get all kinds of bug reports stating "works in plain SQLite fails in SQLAlchemy" and I have no desire to deal with that. Things are generally not problematic for people is one positive thing that can be said about this state of affairs, at least.

(13) By anonymous on 2022-06-06 07:38:56 in reply to 10 [link] [source]

But as this looked to be the result of something being cached / not refreshed I had a feeling it would come back to them.

This has nothing to do with the cached statements; it has to do with when sqlite3_changes is called. Until now, the Python sqlite3 stdlib extension module has called sqlite3_changes after each sqlite3_step. But, according to the SQLite C API docs, it is obvious that we must call sqlite3_changes after the statement has run to completion:

These functions return the number of rows modified, inserted or deleted by the most recently completed INSERT, UPDATE or DELETE statement on the database connection specified by the only parameter.

Take a look at my tests using the reproducer in the gist; you'll see that sqlite3_changes return the correct count, only on the last step, when the statement is run to completion.

The prepared statement cache is a dead end regarding this.

(7) By Simon Slavin (slavin) on 2022-06-05 03:38:39 in reply to 1 [link] [source]

I see that other posters seem to have narrowed this down to a wrapper problem. Perhaps some Python user might like to try APSW:

https://github.com/rogerbinns/apsw

It would be useful to Python users to know whether a second, popular, wrapper has the same problem as the first one.

(8) By Keith Medcalf (kmedcalf) on 2022-06-05 05:14:08 in reply to 7 [link] [source]

APSW works fine. sqlite3_changes / sqlite3_totalchanges is accessed directly against the connection C API when you call the connection changes / totalchanges methods.