SQLite Forum

slow import of CSV files
Login
Firstly, try 

```
SELECT (*) FROM tablename;
```
i.e: Make sure to include the asterisk - it allows special index optimization and should run in the order of milliseconds for a few million rows.
[This /may/ work sans asterisk, but I'm not sure, hence suggesting it]
Side-note: wc -l isn't quite the same. Lines-to-records relation isn't always 1-to-1, the CSV format allows single fields spanning multiple lines.

10 hours seems extremely excessive, even for 100GBs of CSV imports - perhaps if you are importing it with primary key (and/or other indexes) predefined on the table and out of order data, to spinning rust and with meager cache settings. Even then though, 10 hours makes me feel nauseous.

Are the CSV table data very wide perhaps? Contain ludicrously large strings and or blobs perhaps?

If I had a CSV file that was so fat it legitimately needed 10hours of import time, here are the first things I would do:

 - Declare the table without ANY indexes, no primary key either,
 - If the table is pre-existing, then perhaps drop the indexes.
 - Avoid touching the rowid by not declaring an "... INTEGER PRIMARY KEY" field,
 - Import to fastest drive hardware available,
 - Set FSync off (pragma synchronous)
 - Set the allowed Cache size to be several Gigs, and
 - Instate/Re-instate FSync and any Indexes after the import.

This should drink in the CSV file at best possible speed, which I would be surprised if it is even more than 1 hour for a mere 30GB.

Notes:

 1. I realize not all these options are available to you if this should run on a Client's hardware. Do what you can.
 2. If there is a formatting step, like making sure some values are Integer or such, that is best done with the import step - Any slowdown from it typically outweighs having to import to a temp table and then format-importing to the real table.

All items I mentioned above are available in the docs from more info.
If you need assistance with, or more information on, any specific item above, please ask.
If you have implemented all of it and still get runs of 10 hours for a 30GB import, something must be wrong. Perhaps then share the CSV file so we can try and see (if the data isn't sacred, and if it is, do an obfuscation step first).