Removing not null constraint doesn't behave correctly

(1) By anonymous on 2020-06-30 14:25:09 [link] [source]

Hi all,

I am wondering if I am seeing a SQLite bug or not.

These are the SQL commands I'm running.

BEGIN TRANSACTION;

/* Create a table called NAMES */
CREATE TABLE NAMES(Id integer PRIMARY KEY, Name text NOT NULL);

/* Create few records in this table */
INSERT INTO NAMES VALUES(1,'Tom');
INSERT INTO NAMES VALUES(2,'Lucy');
INSERT INTO NAMES VALUES(3,'Frank');
INSERT INTO NAMES VALUES(4,'Jane');
INSERT INTO NAMES VALUES(5,'Robert');
COMMIT;

SELECT * FROM sqlite_master;

/* Remove the not null constraint on Name, using the "simpler procedure" outlined in Section 5 of https://www.sqlite.org/lang_altertable.html */
BEGIN TRANSACTION;
PRAGMA writable_schema = on;
UPDATE sqlite_master SET sql = 'CREATE TABLE NAMES(Id integer PRIMARY KEY, Name text)' WHERE type = 'table' AND name = 'NAMES';
PRAGMA schema_version = 2;
PRAGMA writable_schema = off;
COMMIT;
-- This prints "ok".
PRAGMA integrity_check;

SELECT * FROM sqlite_master;

PRAGMA table_info(sqlite_master);

/* Try to add another column. */
BEGIN TRANSACTION;
ALTER TABLE NAMES ADD Age INTEGER DEFAULT 10;
COMMIT;

/* Display all the records from the table */
SELECT * FROM NAMES;

Adding the "Age" column results in "Error: near line 29: malformed database schema (NAMES) - near ",": syntax error".

I noticed the following behaviour:

If I don't remove the NOT NULL constraint on Name, no error is exhibited.
If I keep the NOT NULL constraint and add a default value to Name (which is documented as supported), the schema appears to update correctly but the subsequently added new column Age is now populated with the default value of Name, instead of its own default value.
If I keep the NOT NULL constraint and change the default value of Name (which is also documented as supported), the schema again appears to update correctly but the following migration to add Age fails with Error: near line 29: malformed database schema (NAMES) - near ",": syntax error.

(2.3) By Larry Brasfield (LarryBrasfield) on 2020-06-30 15:22:40 edited from 2.2 in reply to 1 [link] [source]

There is something strange going on, and likely unintended behavior relating to state that is supposed to track the content of sqlite_master but does not.

I have found two changes to your procedure that "work". One is to simply retry the failing column add statement. The other is to close then reopen the DB before the column add. Both of these suggest that state is changed from what it is just before the initial column add attempt.

To me, it is surprising that direct modification of sqlite_master would become a supported operation. In fact, I wonder if it truly is supported. If so, there should perhaps be a pragma, named something like incorporate_direct_schema_modifications.

(3) By anonymous on 2020-07-01 01:23:14 in reply to 2.3 [link] [source]

Thanks for the reply Larry.

I tried retrying the failing column add statement in the same connection and can't seem to get it to work. However, closing and reopening the DB before the column add does seem to work.

I do wonder if directly modifying sqlite_master is truly supported either. Even though it's documented, it doesn't seem to work as expected. Are you aware of any avenues I can take to report this issue and have it fixed?

(4) By Larry Brasfield (LarryBrasfield) on 2020-07-01 09:44:33 in reply to 3 [link] [source]

If you revise the title of this thread to indicate that this is a bug, Richard Hipp will likely notice it and read far enough to see that something is in fact awry, either with the library itself or the documentation that led you to believe it should work. This forum is where bugs are reported, by project convention.

My initial response to your post was to say that "behave correctly" does not apply because modifying sqlite_master was not supported until quite recently. If I recall correctly, it was positively warned against. It was only after I noticed your claim in a comment that you were following a procedure in the docs that I went to read about this newly advised approach to schema altering.

I would think that where sqlite_master (or sqlite_schema lately) modification is recommended there would be some clearly delineated boundary between what will work and what may not. There is already a caution about changes that invalidate table data on disk. Something similar regarding table and column constraints may be needed. Or it might be documented that changing those will not take effect until certain things happen, such as reopening the database.

(5) By Ryan Smith (cuz) on 2020-07-01 10:13:38 in reply to 3 [source]

It's documented simply because it is possible to do, but it comes with Zero guarantees of stability.

If you enter the unsafe direct editing of the master table, SQLite is relinquishing responsibility and you are taking control of it. At that moment, you and only you become solely responsible for up-keeping the validity of what is in that table. You seem to take responsibility, then let sqlite do some changes... that is never viable (though it may succeed by chance).

With that understood, looking at your example and subsequent notation that closing connections work, I believe the thing that happens is you open 2 connections (at different points probably), changing the schema on one connection while the other doesn't know about it, then the other is trying to adjust what it believes to be the current schema, which is obviously done wrong.

I do see your adjusting the PRAGMA schema_version, but am not sure if this will be detected by the other connection, nor am I sure that it isn't already "2" (which would signify no change at all).

All of these are guesses at what is going on.

Further to that, the change you are trying to make seems a design-time change, not a production necessity - why not use a DB management tool to do that? If not, then you can use some scripts that do it right.

Should be able to google it, or I can send you one (we use internally in SQLiteSpeed), but it's rather straight-forward:

Rename the table to change (ttc) to ttc_old (or similar)
Make the new table with the new schema (CREATE ttc(ID, c1, ...);)
Do a simple query INSERT INTO ttc SELECT ID, c1, ... FROM ttc_old; (Ensure the SELECT creates columns in the same footprint as the new ttc)
DROP TABLE ttc_old;

Notes:

No schema pragma needed
100% accurate
Multiple-connection-safe
Change/Add multiple columns at the same time
Has the deficit of needing a lot of space when that table is severely massive

You may need to:

disable/enable foreign keys before and aft.
depending on if you use legacy mode, drop and recreate indexes and triggers on ttc too.
adjust the data before-hand if the new schema has constraints that will fail but were allowed in the old schema.
Do it all in a transaction which you can roll back if any part fails, obviously.

Hope that helps! Ryan

(6) By Larry Brasfield (LarryBrasfield) on 2020-07-01 10:16:29 in reply to 3 [link] [source]

I add this in case it helps localize the code issue.

Version 3.32.2 of the SQLite shell demonstrates the OP's misbehavior AND allows a second ALTER TABLE NAMES ADD Age INTEGER DEFAULT 10; to succeed (with or without explicit transaction bracketing.)

(To the OP) Stating the version you used to demonstrate the issue would also help.

(7) By anonymous on 2020-07-01 11:57:35 in reply to 6 [link] [source]

I am testing on versions 3.20.1 and 3.22.0.

Note: The example I give is just a minimal repro example, I am working with a different DB which is unfortunately in production already.

Re: Ryan, I don't think it's an issue causing by multiple connections - the problem is that running the SQLite commands above in a single connection causes issues. Reopening the database and running the second migration to add a column works.

Thanks to your suggestion for the alternate solution. I am aware of this solution and originally went with the above solution because it was explicitly documented (https://sqlite.org/lang_altertable.html) for my use case which was removing a not null constraint. If it is documented, merely saying it's possible without giving guarantees to its stability is horrible :( Either the documentation should avoid stating this as a solution or the documentation should clarify what the proper usage of it is IMO.

(8) By anonymous on 2020-07-01 11:59:41 in reply to 6 [link] [source]

In addition, it seems like I can't update the thread title, so I might create a new thread with an updated title and link to this one.

(9) By Richard Hipp (drh) on 2020-07-01 12:30:37 in reply to 1 [link] [source]

Why do you think this is a bug? What were you expecting it to do?

(10) By tom (younique) on 2020-07-01 12:32:10 in reply to 5 [link] [source]

The problem with the 12-step algorithm is an enormous complexity in practice. If you want to apply changes to a table definition, it is not enough to drop and recrate your own views (which you know about), but in fact all views.

This is because a user might have added his own view referring to the table to be changed. So one has to read all view declarations, memorize them, drop them, and recreate them later.

But not enough, a view might have triggers in it, so again, one has to read all triggers (not only those you know about), memorize them, drop them, and recreate them later.

The 12-step ALTER-workaround is an extremely sophisticated and error-prone system. Instead of implementing it every time in every application, I really would love to see it finally being available in SQLite. Speed don't matter at all, but PLEASE, PLEASE, PLEASE provide full ALTER TABLE support 🙏

(11) By jchd (jchd18) on 2020-07-01 13:54:27 in reply to 9 [link] [source]

I'm afraid the 12-step procedure isn't even enough.

Step#3: SELECT type, sql FROM sqlite_master WHERE tbl_name='X'

That doesn't include triggers associated with other tables, where SQL statements involving table X can be present. I have several such use cases and I bet I'm far from alone.

(12) By Richard Hipp (drh) on 2020-07-01 14:34:38 in reply to 11 [link] [source]

So, are you saying this whole thread is a feature request, not a bug report?

(13.1) By Larry Brasfield (LarryBrasfield) on 2020-07-01 15:14:12 edited from 13.0 in reply to 12 [link] [source]

(Please forgive my intrusion into a subthread here.)

I see two different problems here, indicating one or more "bugs" as I believe we understand the term.

Bug1:

As the OP indicated in post 1, in a comment, the docs here, about 60% through section 5, state that the OP's procedure is appropriate for his task. Specifically, it says, "The following simpler procedure is appropriate for removing CHECK or FOREIGN KEY or NOT NULL constraints ...", which is precisely what he attempted to do. That attempt, which followed the 9 step procedure faithfully (I think), seemed to succeed. However, a later operation that should have succeeded, an add column via ALTER TABLE, resulted in a what appears to be a syntax error.

This is enough, IMO, to say that there is a discrepancy between what the docs say is appropriate and what will reliably leave the running DBMS in a state where it can operate in other ways as documented. It seems to be a bug for that reason.

Bug2:

The specific error that occurs at the end of the OP's repro steps is reported as a syntax error. But it is not. The syntax is fine, as evidenced by it being accepted after closing and reopening the DB or studying the railroad chart.

I maintain that the parser, (which I assume generates the "syntax error" diagnostic), should not produce different results when parsing the same SQL at different times. Either it is looking at different SQL to produce the error seen by the OP, or its behavior has been affected by something very likely to be the infamous undefined behavior that results from bad pointer usage. Either way, it reflects some undesirable state left behind by the OP's use of the "appropriate" 9 step procedure.

This may well be manifestation of a more serious bug than simple mismatch between documented behavior and implemented behavior.

(14) By Richard Hipp (drh) on 2020-07-01 15:17:14 in reply to 13.0 [link] [source]

I see. I didn't notice the scrollbars and I was only seeing the OP's report down through the first "PRAGMA integrity_check" and everything seemed fine up until then.

The problem is that after an edit to the sqlite_master table, you really do need to close and reopen the database connection so that SQLite will rebuild its internal schema. If you do that, the script works as desired.

I will update the documentation accordingly.

(15) By Richard Hipp (drh) on 2020-07-01 16:22:08 in reply to 14 [link] [source]

... Or, even better, I'll fix the PRAGMA schema_version command so that it works as advertised in the ALTER TABLE documentation. That has now been done.

Please try again using the latest prerelease snapshot and let me know if it is now working better for you.

(16) By Larry Brasfield (LarryBrasfield) on 2020-07-01 16:29:55 in reply to 14 [link] [source]

(I'm hoping not to seem a nag, here.)

That is, IMO, a perfectly adequate cure for Bug1. However, it does nothing for Bug2. To state Bug2 more succinctly: There is a sequence of inputs which leaves the running SQLite data structures in an ill-defined state, such that the parser either behaves oddly or is fed input poorly related to design intent.

A hint of the ill-defined state is in my post #2, stating that simply retrying the failing column add statement "works". The same statement, submitted twice in a row, fails the first time and succeeds the second time, with v3.32.2. I think you would agree that this is odd and may indicate a deeper problem. If the parser sees the same bogus input both times, it should fail both times unless it has become non-deterministic. And if it is fed bogus input the first time but not the second, then I would want to know why that first input cannot be something that produces an address fault or other undesirable effects.

I suspect that recompiling sqlite_schema is a good idea whenever it is modified. It could be a build-time option to allow DML or DDL on it at all and do the recompilation as prudence demands.

(17) By tom (younique) on 2020-07-01 17:00:03 in reply to 15 [link] [source]

Would you also consider an internal implementation of the complex and error-prone 12-step procedure for doing table changes?

(18) By Richard Hipp (drh) on 2020-07-01 17:26:20 in reply to 17 [link] [source]

Sure. Send in a patch and we will have a look.

(19) By jchd (jchd18) on 2020-07-01 19:01:02 in reply to 12 [link] [source]

I don't refer to the thread per se.

My remark is simply pointing out that in the 12-step procedure proposed to make profond changes to a table X (beyond what alter table currently does), the step #3 is not sufficient in all cases and must also include triggers of other tables which issue SQL statement(s) against the changed table X.

For instance, in the "4. Some example triggers" paragraph, the first trigger updates table orders "after UPDATE OF address ON customers".

So if one only looks at what is in sqlite_master with name 'customers', the trigger update_customer_address won't be selected/examined/modified and might cause havoc after an alter table changing sqlite_master directly. This won't be noticed until the unchanged trigger is invoked, as AFAIK triggers are only parsed when launched.

(20) By Larry Brasfield (LarryBrasfield) on 2020-07-01 20:44:36 in reply to 15 [link] [source]

Without speaking for the OP, I can report:

That snapshot (timestamped 202007011619) runs the OP's "misbehaving" repro steps without any misbehavior that I can see with my limited testing. (applying them to an on-disk DB, opening it again, integrity check, selects, inserts)

(21) By anonymous on 2020-07-01 23:52:21 in reply to 20 [link] [source]

Thanks all for your help on this! Greatly appreciated.

I also tried the latest snapshot 202007011619 and verified that the problem doesn't seem to occur any more. Within the same DB connection or with a new connection, I am able to add a new column after removing the not null constraint and inserts and selects seem to work fine.

(22) By anonymous on 2020-07-01 23:55:06 in reply to 20 [link] [source]

For my own information, was anything changed to that snapshot as a result of this thread to make this work? Or did there happen to be some fixes already made that solved the issue?

(23.1) By Larry Brasfield (LarryBrasfield) on 2020-07-02 00:39:17 edited from 23.0 in reply to 22 [link] [source]

According to facts stated in Richard's post #19, which can be independently seen in the project VCS timeline, that snapshot was made at nearly the same time according to its filename timestamping and incorporates a change responsive to the issue raised in this thread. While I have no reason to doubt Richard's word on this, I can also see that no other recent checkins [a], since v3.32.2 which reproduces the misbehavior, do anything to address it.

[a. I take the checkin comments at their word, without deeper examination. ]

Hence, it is fair to conclude that the change fixing the misbehavior was instigated and motivated by Richard's reading of this thread, in conjunction with his dedication to excellence and keeping the active bug list at or near zero length.

(24) By anonymous on 2020-07-02 09:00:40 in reply to 23.1 [link] [source]

That's super awesome! It's great to see bugs addressed in such a timely manner!

(25) By Ryan Smith (cuz) on 2020-07-02 09:37:19 in reply to 24 [link] [source]

It's great to see bugs addressed in such a timely manner!

You must be new here.

That's the very standard. :)