SQLite Forum

slow import of CSV files
Login
Dear SQLiters,

I have large CSV files (30GB, ~100M records ). These are dumps from existing database (unfortunately). I therefore have to re-create schema, clean up (NULL versus empty string, convert to real, etc). I do this in two steps (all within sqlite3 shell):

Step 1. sqlite3's .import of CSV file into a table QQQ that will be deleted once all finished. This makes all columns TEXT since no type is available.  This step is reasonably fast, 20min

Step 2. Create table according to schema with PRIMAMRY KEY and other constraints and INSERT from QQQ into the final destination.

The second step is slow and it gets slower as database size grows. I suspect but not sure the slowdown is caused by enforcing uniqueness of the PRIMARY KEY. Since I know the data source is a database, I do not need to enforce the check on the fly. Plus, I know there is PRAGMA integrity_check;  that can be used to verify at the end after everything is inserted. If I am correct, is there a way to disable or otherwise speed up the second step? I know FOREIGN KEY constrain can be turned off (and I keep it off). I know PRAGMA ignore_check_constraints could turn off CHECK. But nothing on primary key enforcement. Maybe my two-step process is wrong and there is a better way?

In my case PRIMARY KEY is a string (used to be 38 digit integer, too large for SQLite as I understand).

Thank you,

Roman