sqlite-utils transform - command-line tool implementing the advanced ALTER TABLE pattern
(1.1) By Simon Willison (simonw) on 2020-09-23 03:04:59 edited from 1.0 [link] [source]
The SQLite ALTER TABLE documentation describes a pattern for running advanced alter table operations (dropping columns, changing column types etc). The recommendation is to create a new table with the desired shape, copy the data across from the old table, then drop the old table and rename the new one to replace it.
Today I added support for this pattern to my sqlite-utils package, which now offers this capability as both a command-line utility and a Python library method.
Here's an example of the command-line utility in action:
% wget https://latest.datasette.io/fixtures.db
% sqlite3 fixtures.db '.schema roadside_attractions'
CREATE TABLE roadside_attractions (
pk integer primary key,
name text,
address text,
latitude real,
longitude real
);
% sqlite-utils transform fixtures.db roadside_attractions \
--rename pk id \
--default name Untitled \
--drop address
% sqlite3 fixtures.db '.schema roadside_attractions'
CREATE TABLE IF NOT EXISTS "roadside_attractions" (
[id] INTEGER PRIMARY KEY,
[name] TEXT DEFAULT 'Untitled',
[latitude] FLOAT,
[longitude] FLOAT
);
You can use the --sql
command to output the SQL that would be executed, which is useful for understanding how it works:
% sqlite-utils transform fixtures.db roadside_attractions \
--rename pk id \
--default name Untitled \
--drop address \
--sql
CREATE TABLE [roadside_attractions_new_d98d349ab698] (
[id] INTEGER PRIMARY KEY,
[name] TEXT DEFAULT 'Untitled',
[latitude] FLOAT,
[longitude] FLOAT
);
INSERT INTO [roadside_attractions_new_d98d349ab698] ([id], [name], [latitude], [longitude])
SELECT [pk], [name], [latitude], [longitude] FROM [roadside_attractions];
DROP TABLE [roadside_attractions];
ALTER TABLE [roadside_attractions_new_d98d349ab698] RENAME TO [roadside_attractions];
Here's how to do the same thing using the Python library:
import sqlite_utils
db = sqlite_utils.Database("fixtures.db")
db["roadside_attractions"].transform(
rename={"pk": "id"},
defaults={"name": "Untitled"},
drop={"address"}
)
Take a look at the documentation for the CLI tool and for the Python method for more details.
(2) By Simon Willison (simonw) on 2020-09-23 17:48:43 in reply to 1.1 [link] [source]
Also in sqlite-utils 2.20
is the new sqlite-utils extract
command, which can be used to extract columns into a separate table.
For example, if your table looks like this:
CREATE TABLE "salaries" (
[rowid] INTEGER PRIMARY KEY,
[Year Type] TEXT,
[Year] TEXT,
[Organization Group Code] TEXT,
[Organization Group] TEXT,
[Department Code] TEXT,
[Department] TEXT,
[Union Code] TEXT,
[Union] TEXT,
[Job Family Code] TEXT,
[Job Family] TEXT,
[Job Code] TEXT,
[Job] TEXT,
[Employee Identifier] INTEGER,
[Salaries] FLOAT,
[Overtime] FLOAT,
[Other Salaries] FLOAT,
[Total Salary] FLOAT,
[Retirement] FLOAT,
[Health and Dental] FLOAT,
[Other Benefits] FLOAT,
[Total Benefits] FLOAT,
[Total Compensation] FLOAT
);
You can run commands that will transform it into this:
CREATE TABLE [organization_groups] (
[id] INTEGER PRIMARY KEY,
[code] TEXT,
[name] TEXT
);
CREATE TABLE [departments] (
[id] INTEGER PRIMARY KEY,
[code] TEXT,
[name] TEXT
);
CREATE TABLE [unions] (
[id] INTEGER PRIMARY KEY,
[code] TEXT,
[name] TEXT
);
CREATE TABLE [job_families] (
[id] INTEGER PRIMARY KEY,
[code] TEXT,
[name] TEXT
);
CREATE TABLE [jobs] (
[id] INTEGER PRIMARY KEY,
[code] TEXT,
[name] TEXT
);
CREATE TABLE IF NOT EXISTS "salaries" (
[rowid] INTEGER PRIMARY KEY,
[Year Type] TEXT,
[Year] TEXT,
[organization_group_id] INTEGER REFERENCES [organization_groups]([id]),
[department_id] INTEGER REFERENCES [departments]([id]),
[union_id] INTEGER REFERENCES [unions]([id]),
[job_family_id] INTEGER REFERENCES [job_families]([id]),
[job_id] INTEGER REFERENCES jobs(id),
[Employee Identifier] INTEGER,
[Salaries] FLOAT,
[Overtime] FLOAT,
[Other Salaries] FLOAT,
[Total Salary] FLOAT,
[Retirement] FLOAT,
[Health and Dental] FLOAT,
[Other Benefits] FLOAT,
[Total Benefits] FLOAT,
[Total Compensation] FLOAT
);
I wrote about that in detail here: https://simonwillison.net/2020/Sep/23/sqlite-utils-extract/
(3) By Simon Willison (simonw) on 2020-09-24 17:52:11 in reply to 2 [source]
Running extract against a table with 680,000 rows was taking 12 minutes to run. I did some work on optimizing it and came up with an approach that knocked it down from 12 minutes to just 4 seconds!
Details on the optimization here: https://github.com/simonw/sqlite-utils/issues/172