SQLite Forum

sqlite-utils transform - command-line tool implementing the advanced ALTER TABLE pattern
Login

sqlite-utils transform - command-line tool implementing the advanced ALTER TABLE pattern

(1.1) By Simon Willison (simonw) on 2020-09-23 03:04:59 edited from 1.0 [source]

The SQLite ALTER TABLE documentation describes a pattern for running advanced alter table operations (dropping columns, changing column types etc). The recommendation is to create a new table with the desired shape, copy the data across from the old table, then drop the old table and rename the new one to replace it.

Today I added support for this pattern to my sqlite-utils package, which now offers this capability as both a command-line utility and a Python library method.

Here's an example of the command-line utility in action:

% wget https://latest.datasette.io/fixtures.db
% sqlite3 fixtures.db '.schema roadside_attractions'
CREATE TABLE roadside_attractions (
    pk integer primary key,
    name text,
    address text,
    latitude real,
    longitude real
);
% sqlite-utils transform fixtures.db roadside_attractions \
    --rename pk id \
    --default name Untitled \
    --drop address
% sqlite3 fixtures.db '.schema roadside_attractions'       
CREATE TABLE IF NOT EXISTS "roadside_attractions" (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT DEFAULT 'Untitled',
   [latitude] FLOAT,
   [longitude] FLOAT
);
You can use the --sql command to output the SQL that would be executed, which is useful for understanding how it works:
% sqlite-utils transform fixtures.db roadside_attractions \
    --rename pk id \
    --default name Untitled \
    --drop address \
    --sql
CREATE TABLE [roadside_attractions_new_d98d349ab698] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT DEFAULT 'Untitled',
   [latitude] FLOAT,
   [longitude] FLOAT
);
INSERT INTO [roadside_attractions_new_d98d349ab698] ([id], [name], [latitude], [longitude])
   SELECT [pk], [name], [latitude], [longitude] FROM [roadside_attractions];
DROP TABLE [roadside_attractions];
ALTER TABLE [roadside_attractions_new_d98d349ab698] RENAME TO [roadside_attractions];

Here's how to do the same thing using the Python library:

import sqlite_utils

db = sqlite_utils.Database("fixtures.db")
db["roadside_attractions"].transform(
  rename={"pk": "id"},
  defaults={"name": "Untitled"},
  drop={"address"}
)

Take a look at the documentation for the CLI tool and for the Python method for more details.

(2) By Simon Willison (simonw) on 2020-09-23 17:48:43 in reply to 1.1 [link] [source]

Also in sqlite-utils 2.20 is the new sqlite-utils extract command, which can be used to extract columns into a separate table.

For example, if your table looks like this:

CREATE TABLE "salaries" (
   [rowid] INTEGER PRIMARY KEY,
   [Year Type] TEXT,
   [Year] TEXT,
   [Organization Group Code] TEXT,
   [Organization Group] TEXT,
   [Department Code] TEXT,
   [Department] TEXT,
   [Union Code] TEXT,
   [Union] TEXT,
   [Job Family Code] TEXT,
   [Job Family] TEXT,
   [Job Code] TEXT,
   [Job] TEXT,
   [Employee Identifier] INTEGER,
   [Salaries] FLOAT,
   [Overtime] FLOAT,
   [Other Salaries] FLOAT,
   [Total Salary] FLOAT,
   [Retirement] FLOAT,
   [Health and Dental] FLOAT,
   [Other Benefits] FLOAT,
   [Total Benefits] FLOAT,
   [Total Compensation] FLOAT
);

You can run commands that will transform it into this:

CREATE TABLE [organization_groups] (
   [id] INTEGER PRIMARY KEY,
   [code] TEXT,
   [name] TEXT
);
CREATE TABLE [departments] (
   [id] INTEGER PRIMARY KEY,
   [code] TEXT,
   [name] TEXT
);
CREATE TABLE [unions] (
   [id] INTEGER PRIMARY KEY,
   [code] TEXT,
   [name] TEXT
);
CREATE TABLE [job_families] (
   [id] INTEGER PRIMARY KEY,
   [code] TEXT,
   [name] TEXT
);
CREATE TABLE [jobs] (
   [id] INTEGER PRIMARY KEY,
   [code] TEXT,
   [name] TEXT
);
CREATE TABLE IF NOT EXISTS "salaries" (
   [rowid] INTEGER PRIMARY KEY,
   [Year Type] TEXT,
   [Year] TEXT,
   [organization_group_id] INTEGER REFERENCES [organization_groups]([id]),
   [department_id] INTEGER REFERENCES [departments]([id]),
   [union_id] INTEGER REFERENCES [unions]([id]),
   [job_family_id] INTEGER REFERENCES [job_families]([id]),
   [job_id] INTEGER REFERENCES jobs(id),
   [Employee Identifier] INTEGER,
   [Salaries] FLOAT,
   [Overtime] FLOAT,
   [Other Salaries] FLOAT,
   [Total Salary] FLOAT,
   [Retirement] FLOAT,
   [Health and Dental] FLOAT,
   [Other Benefits] FLOAT,
   [Total Benefits] FLOAT,
   [Total Compensation] FLOAT
);

I wrote about that in detail here: https://simonwillison.net/2020/Sep/23/sqlite-utils-extract/

(3) By Simon Willison (simonw) on 2020-09-24 17:52:11 in reply to 2 [link] [source]

Running extract against a table with 680,000 rows was taking 12 minutes to run. I did some work on optimizing it and came up with an approach that knocked it down from 12 minutes to just 4 seconds!

Details on the optimization here: https://github.com/simonw/sqlite-utils/issues/172