DROP COLUMN feature with indexes

(1) By Mark Lawrence (mark) on 2021-02-22 16:28:32 [link]

The DROP COLUMN feature (from the development branch) doesn't handle indexes:

    CREATE TABLE p1(a PRIMARY KEY, b UNIQUE);
    CREATE TABLE c1(x, y, z REFERENCES p1(c));
    CREATE TABLE c2(x, y, z, w REFERENCES p1(b));
    CREATE INDEX i_cz ON c1(z);
    CREATE INDEX i_cxy ON c2(x,z);
    ALTER TABLE c1 DROP COLUMN z;
    ALTER TABLE c2 DROP COLUMN z;
    SELECT sql FROM sqlite_schema WHERE name IN ('c1', 'c2');

The above results in:

    Error: near line 6: error in index i_cz after drop column: no such column: z
    Error: near line 7: error in index i_cxy after drop column: no such column: z

(2) By Dan Kennedy (dan) on 2021-02-22 16:46:15 in reply to 1 [link]

What should it do instead? Automatically drop the indexes? Automatically change "c1(x,z)" to "c1(x)"?

Dan.

(3) By tom (younique) on 2021-02-22 17:34:02 in reply to 2

I would suggest completely dropping all involved indexes, but issuing a warning about what indexes have been dropped.

(4) By Mark Lawrence (mark) on 2021-02-22 18:23:40 in reply to 2 [link]

I vote for dropping them. It would be consistent with DROP TABLE behaviour for indexes.

On a related note, I'm not sure how much schema checking SQLite does for triggers, but CREATE TRIGGER x BEFORE UPDATE OF z ON c1 also remains unmodified after a column is dropped.

(5) By Larry Brasfield (larrybr) on 2021-02-22 18:55:14 in reply to 3 [link]

I don't think that "issuing a warning" maps to any DDL result. The alternatives are fail and succeed. The SQLite API has no support for anything in between.

The errors thrown are pretty clear, and are enough to suggest what index(es) need to be dropped before a retry. Dropping an index seems a bit much to me. 

If it is possible, I prefer to write DDL sequences that leave a DB in a self-consistent state. I much prefer that the DBMS yap at me when I deviate.

(6) By J. King (jking) on 2021-02-22 19:02:45 in reply to 5 [link]

> The errors thrown are pretty clear, and are enough to suggest what index(es) need to be dropped before a retry. Dropping an index seems a bit much to me. 

I agree. I would not want to silently lose an index because I forgot it was using a column I no longer care about.

(7) By Larry Brasfield (larrybr) on 2021-02-22 19:06:40 in reply to 4 [link]

> [Dropping indexes using a dropped column] would be consistent with DROP TABLE behaviour for indexes.

Not quite. An index exists only on one table; when the table vanishes the index makes no sense. But an index that covers more than one column may still make sense except for non-existent columns it mentions.

This issue makes me wish for "ALTER INDEX DROP COLUMN ...".

(8) By ddevienne on 2021-02-22 19:07:57 in reply to 6 [link]

That's what `CASCADE` is for. I.e. dropping related objects is ***opt-in***.

(9) By Keith Medcalf (kmedcalf) on 2021-02-22 20:11:17 in reply to 2 [link]

I should think the ALTER TABLE command should fail with an error that the column is being used.  This should apply not only to indexes but also to triggers and views which reference the column being dropped explicitly.

In the normal case executing a DDL command should not result in an inconsistent schema.

If pragma legacy_alter_table is in effect then any indexes referencing the column should be dropped and explicit uses of the column name in triggers or views should be ignored.

(10) By Mark Lawrence (mark) on 2021-02-22 21:02:20 in reply to 9 [link]

Do you have a rationale for why you prefer fail to drop? If the user intends to drop a column and SQLite says you can't because of x,y and z, then the user has to manually do that work anyway. Why make them jump through the extra hoops? To me that is like asking the user to manually delete triggers before dropping a table.

There is however at least one situation today where DDL does result in an inconsistent schema: the first table of a circular foreign key reference. Ideally that would only be allowed inside a transaction and rejected at COMMIT time unless resolved by additional DDL statements. But that doesn't seem to be the case.

Similarly, perhaps one could imagine wanting to change a column type, leaving the rest of the schema alone:

    BEGIN;
    ALTER TABLE t ADD COLUMN x1 $NEWTYPE;
    UPDATE t SET x1=x;
    ALTER TABLE t DROP COLUMN x;
    ALTER TABLE t RENAME COLUMN x1 TO x;
    COMMIT;

It would possibly be annoying to have to recreate all views and triggers here. In that case an immediate dropping of related views and triggers would not be so useful. But then neither should the DROP COLUMN command fail before the surrounding transaction commits.

(11) By Dan Kennedy (dan) on 2021-02-22 21:13:17 in reply to 4 [link]

> On a related note, I'm not sure how much schema checking SQLite does for triggers, but CREATE TRIGGER x BEFORE UPDATE OF z ON c1 also remains unmodified after a column is dropped.

Good eye! This is a special case, because SQLite never checks for or complains about unknown column names in the "UPDATE OF" clause of a trigger. If the column being dropped is used anywhere else in the trigger it should be an error.

Dan.

(12.1) By tom (younique) on 2021-02-22 22:57:53 edited from 12.0 in reply to 9 [link]

Please, no new "DROP TABLE IF EXISTS" affair! 😉

What I mean is that DROP COLUMN should not fail in such a case. This might be nice when working in the SQLite-CLI, but it is a no-go for automatic routines. It makes DROP COLUMN worthless if one has to know about the backgrounds to get it to work.

A common usecase of an application database is upgrading the database schema. Due to the restricted ALTER TABLE command, this is currently a very complex and error-prone procedure. Imho, it should get *easier* to handle with DROP COLUMN, and not only shift the complexity.

Example: An application upgrades its database schema and knows that column X has to be dropped because it has been replaced by something else. The application has never used an index on column X, so the upgrade procedure just says "DROP COLUMN X". But some user might have used X quite often and added an index for better performance. Bang -- the application's upgrade procedure fails. Solution: query all indexes first, remember them, drop what is needed, drop column, maybe recreated some indexes.... does one have to do the same with views as well? This is no improvement. It just keeps the complexity of the current ALTER TABLE procedure.

(13) By tom (younique) on 2021-02-22 22:17:11 in reply to 2 [link]

Maybe, the [documentation from MariaDB](https://mariadb.com/kb/en/alter-table/) provides some help:


... DROP COLUMN [IF EXISTS] col_name [CASCADE|RESTRICT]

Drops the column from the table. If you are using IF EXISTS you will not get an error if the column didn't exist. If the column is part of any index, the column will be dropped from them, except if you add a new column with identical name at the same time. The index will be dropped if all columns from the index were dropped. If the column was used in a view or trigger, you will get an error next time the view or trigger is accessed.
MariaDB starting with 10.2.8

Dropping a column that is part of a multi-column UNIQUE constraint is not permitted. For example:

CREATE TABLE a (
  a int,
  b int,
  primary key (a,b)
);

ALTER TABLE x DROP COLUMN a;
[42000][1072] Key column 'A' doesn't exist in table

The reason is that dropping column a would result in the new constraint that all values in column b be unique. In order to drop the column, an explicit DROP PRIMARY KEY and ADD PRIMARY KEY would be required. Up until MariaDB 10.2.7, the column was dropped and the additional constraint applied, resulting in the following structure:

ALTER TABLE x DROP COLUMN a;
Query OK, 0 rows affected (0.46 sec)

DESC x;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| b     | int(11) | NO   | PRI | NULL    |       |
+-------+---------+------+-----+---------+-------+

MariaDB starting with 10.4.0

MariaDB 10.4.0 supports instant DROP COLUMN. DROP COLUMN of an indexed column would imply DROP INDEX (and in the case of a non-UNIQUE multi-column index, possibly ADD INDEX). These will not be allowed with ALGORITHM=INSTANT, but unlike before, they can be allowed with ALGORITHM=NOCOPY

RESTRICT and CASCADE are allowed to make porting from other database systems easier. In MariaDB, they do nothing.

(14) By Keith Medcalf (kmedcalf) on 2021-02-22 23:08:12 in reply to 10 [link]

> Do you have a rationale for why you prefer fail to drop?

Yes.  The use of a DDL command should not result in an invalid schema unless you have specified that this is the result you intend.

 > To me that is like asking the user to manually delete triggers before dropping a table.

No, it is not.  If you drop a table then all triggers which apply to that table (as in specified in the ON clause) should be dropped because they are entirely no longer applicable.

Table names being dropped that are used in the body of a view or trigger that is not ON that table should act to prohibit the table from being dropped unless the user has indicated that they wish to allow an inconsistent schema result (pragma legacy_alter_table).

Column names being dropped that are referenced in a view or trigger body (or in an index) should also cause the drop of the column name to fail with an error rather than leaving the schema in an inconsistent state UNLESS one has specifically enabled the option for allowing an inconsistent schema (pragma legacy_alter_table).

(15) By Keith Medcalf (kmedcalf) on 2021-02-22 23:11:15 in reply to 14 [link]

Perhaps the pragma legacy_alter_table is inappropriately named.  Perhaps it should be renamed (or aliased) to pragma allow_inconsistent_schema, which is more in line with its actual functioning.

(16) By tom (younique) on 2021-02-23 00:06:19 in reply to 14 [link]

Well, following your argumentation, "if you drop a column then all indexes which apply to that column (as in specified in the ON clause) should be dropped because they are entirely no longer applicable."

(17) By Keith Medcalf (kmedcalf) on 2021-02-23 01:04:02 in reply to 16 [link]

That only applies if the column being dropped comprises the ONLY column in the index.  That is to say that for the following schema:

```
create table x(a, b, c);
create [unique] index x_b on x(b);
```

then if you do an `ALTER TABLE x DROP COLUMN b;` I would assert that mayhaps the index `x_b` can be dropped automagically, or mayhaps not (arguments can be made for either position).  

However for the following schema:

```
create table x(a, b, c);
create [unique] index x_bc on x(b,c);
```

that `ALTER TABLE x DROP COLUMN b;` should fail.  There is absolutely nothing which can be done to make the schema transition to a consistent state.  You (the user) should have to drop the index x_bc manually before the DROP COLUMN can be processed successfully.

(18) By tom (younique) on 2021-02-23 08:55:03 in reply to 17 [link]

Imho, DROP COLUMN should *not* fail because of such a banality as it makes it impossible to due automatic database manipulation (see my post above).

MariaDB recreates such an index without the dropped column and I vote for following this paradigm.

(19) By Paul (greenscape) on 2021-02-23 09:53:31 in reply to 9 [link]

Absolutely! Safety should be first. After all, how often would you drop columns? Why trade safety because of laziness?

All hidden stuff *always* leads to bugs that are hard to detect.

Please, just do as Keith suggested and emit an error.

(20) By Paul (greenscape) on 2021-02-23 09:59:50 in reply to 12.1 [link]

When you're upgrading schema - your main concern should be not to screw up. Not to make it easier. If we're talking about a serious product.

If you're upgrading a schema and you're **forgot** that, when dropping column bar of foo table, said column is used elsewhere - **this is your problem**! And if SQLite emits an error - you should be thankful. Handling such conflicts (indices triggers) is your responsibility. SQLite will never be able to automatically make it work for **every** possible case.

(21) By Mark Lawrence (mark) on 2021-02-23 10:59:20 in reply to 2 [link]

Take the following scenario, where there is no explicit index to drop:

    CREATE TABLE t1(
        a INTEGER primary key,
        b INTEGER,
        c INTEGER,
        UNIQUE(b,c)
    );

    ALTER TABLE t1 DROP COLUMN c;

If the DROP COLUMN fails instead of removing the automatic index, then the caller has no option but to re-create the table completely. This reduces the useful scope of DROP COLUMN.

(22) By Ryan Smith (cuz) on 2021-02-23 11:43:00 in reply to 18 [link]

The use of hyperbole won't make your point any stronger.

> DROP COLUMN should not fail because of such a banality as it makes it impossible to due automatic database manipulation

"Banality"? - Saying "Explicitly" what you want to happen, for EVERY piece of the puzzle, is a base tenet of accurate programming.

"Impossible"? - That's just demonstrably false. Impossible to do it in a lazy way? Maybe, and if so, Good.

There is no substitute for proper and full data statements. I for one would hate if SQLite starts doing magic background things when I try to alter a table in a way that doesn't stroke with the current rest of the schema. I would very much like to know this (by error message) before I go ahead and roll it out to actual production environments<sup>1</sup>.

Lastly - MariaDB/MySQL, great DB as it is, should not be held up as the defacto standard for DB operations, ever. Show PostGres doing it, and you may have a more valid point.

[1] If an entire index exists based on only the column being dropped, I'm happy for that Index to be dropped because the database goes from consistent state to consistent state, and if you do not need the column anymore, you must not need that index anymore. This also means that if I drop the index explicitly before the column (the way I like to do it), everything still works exactly. The moment however there is any index using that column in addition to any other, and I did not explicitly "fix" that index before trying to drop that column, please error out - Loudly and clearly.

(23) By Mark Lawrence (mark) on 2021-02-23 12:04:43 in reply to 2 [link]

What Would PostgreSQL Do (WWPD)?

> DROP [ COLUMN ] [ IF EXISTS ] column_name [ RESTRICT | CASCADE ]
>
> This form drops a column from a table. Indexes and table constraints involving the column will be automatically dropped as well. Multivariate statistics referencing the dropped column will also be removed if the removal of the column would cause the statistics to contain data for only a single column. You will need to say CASCADE if anything outside the table depends on the column, for example, foreign key references or views. If IF EXISTS is specified and the column does not exist, no error is thrown. In this case a notice is issued instead.

From https://www.postgresql.org/docs/current/sql-altertable.html

(24) By tom (younique) on 2021-02-23 13:42:39 in reply to 22 [link]

I do not understand your problem. No one will stop you from explicitly dropping an index first and the column afterwards. This is always possible, even now.

Having DROP COLUMN should be an easement compared to the current procedure. And having to programatically check for indexes, remember them and recreate them is no way easier than the current situation.

And yes, indexes *are* a banality. They are in no way an existential part of any database, they only speed things up. Notabene, we are talking about indexes, not constraints!

(25) By Richard Damon (RichardDamon) on 2021-02-23 13:44:40 in reply to 18 [link]

> Imho, DROP COLUMN should not fail because of such a banality as it makes it impossible to due automatic database manipulation (see my post above).

My personal feeling is that this isn't a strong argument. If a system is making an 'automatic' manipulation, it should know enough about the database schema to be able to 'automatically' do the steps needed to fix the issue before trying to drop the column. Otherwise, it seems a bit like giving a five-year-old a loaded handgun and telling them to 'play safe'. 

For cases like this, I like the line from the Zen of Python:

> In the face of ambiguity, refuse the temptation to guess.

It MIGHT be safe to just drop the column out of the index, or it might be that the whole index is now worthless (do we spend the cycles to see if it is now a redundant index, or try to guess if it will ever be used?). Removing the column in cases like a UNIQUE index may well create a constraint violation, in that case, do we switch from removing the column to dropping the index, or does that mean we should fail the DROP COLUMN command?

Particularly when adding a new feature, it is easier to start in the most restricted case, and we can later make it do more. Due to backward compatibility constraints, once you can't as easily take back an automatically done action as adding it later. (Look at things like the fact that primary keys are allowed to be NULL, which has to remain because of backward compatibility).

(26) By Richard Damon (RichardDamon) on 2021-02-23 13:54:36 in reply to 21 [link]

Dropping a column that is part of an unnamed multi-column UNIQUE constraint is probably a good reason for SQLite to reject the command.

Should that UNIQUE(b,c) constraint become a UNIQUE(b) constraint or just be dropped? Likely b won't be UNIQUE on its own (or the constraint would have just been UNIQUE(b) to begin with and dropping the column out of the constraint would fail. Just dropping the constraint is likely a problem, as there was very likely a reason the constraint was there in the first place.

Maybe the lesson is that if you want to be able to maximally alter a schema you need to remember to name all your constraints (as unnamed constraints are much harder to change). Perhaps it also says that for cases like this, it makes sense to say the right answer is to remake the table, as there is no clear universal answer as to what the results should be.

(27) By tom (younique) on 2021-02-23 14:14:06 in reply to 22 [link]

> Lastly - MariaDB/MySQL, great DB as it is, should not be held up as the defacto standard for DB operations, ever. Show PostGres doing it, and you may have a more valid point.

MySQL is still much [more wide spread](https://db-engines.com/en/ranking) than Postgres ;)

(28) By Ryan Smith (cuz) on 2021-02-23 14:32:22 in reply to 27 [link]

> MySQL is still much more wide spread than Postgres ;)

Argument fallacy - Appeal to the masses.

SQLite is near infinitely more widespread than MySQL and Postgres put together. Should they be made to adopt the SQLite way of things?

I'm not advocating to use the Postgres way (even though I believe Postgres does work the way you are lobbying for, so *you* should advocate for it), I'm just saying if you are going to appeal to authority, appeal to Postgres in stead of MySQL, it's a much higher authority in the world of correct database principles.


PS: AS I've said many times before, I'm a big MySQL fan, it works mostly pretty fast and is easy to use, even in "strict" mode, which I religiously enable. For simple client-server projects I even prefer it over Postgres (shock, horror) but I'm not blind to its faults and it most certainly should not be held as the standard.