SQLite Forum

Best order to vaccum, analyze, create indexes, etc.. ?
Login
Hi. I'm converting large scientific proprietary files into SQLite databases.  
Many blobs in two large tables, with structured meta-data in these and other  
tables.

That's already working, but I wonder what's the optimum order for doing it.  
Here's how I do it currently:

```
START TRANSACTION
foreach table
  CREATE TABLE ... -- ROWID tables
for millions of rows, GB of data
  INSERT INTO ...
CREATE INDEX
COMMIT
VACUUM
ANALYZE
```

I've already turned OFF journaling, since the DB is not "online",  
not activated FKs (which are expressed in the table definitions),   
used a largish page-size (16K) and page-cache (32MB) to maximize  
import performance, etc...

I have a few small tables, and two very large tables, and all  
inserts are *interleaved*, that's why I vaccum (an expensive  
operation) post-commit, to *defragment* the DB, and maximize  
read speed when the DBs are put "online" (data served over  
WebSocket using a custom protocol).

I have several UNIQUE constraints in my table definitions,  
and I wonder whether I should remove those, and do explicit  
CREATE UNIQUE INDEX DDLs post-insert? (but pre-commit?).

I also wonder whether the CREATE *(non-UNIQUE)* INDEX should be  
pre-VACCUM, or post-VACCUM? I suspect indexes wouldn't be fragmented  
if created post-VACCUM, thus saving that much IO from the VACCUM?

Finally, I wonder if I can somehow *parallelize* inserts into my  
main two BIG tables? Perhaps inserting into two separate DBs, and  
do a CTAS from one DB to the next once both are complete, since  
similar to the CREATE INDEX case, this would *implicitly defragment*  
the table pages, no? Is CTAS optimized to do direct page copies?  
Or must it decode all records from the source, recode them into the  
destination, rebuilding the B-TREE from scratch?

I'm basically asking for advice/guidance on the most performant way  
to do the above, which hopefully is clear enough. TIA, --DD