Deferred index update?
I’m using SQLite to construct an updatable dataset (a rather large one, with 10M+ rows). I’m repeatedly issuing
INSERT … ON CONFLICT DO UPDATE statements, adding a few thousand tuples at a time.
This is OK when I just need to update the dataset with new data, but slow when constructing the dataset from scratch. Understandably so: the index has to be continuously updated, as it’s possible that each new row can conflict with already existing ones. I think this yields a runtime of O(n^2 log n), where n is the number of inserted rows.
However, in the create-from-scratch scenario, I can guarantee that there will be no conflicts.
Now, I could change my code to generate plain
INSERT statements without an index in the creation mode (and only create the index at the end of the process), but I’m wondering whether I could have my cake and eat it too (by not introducing modality). I envision a directive that would tell SQLite to “stop updating index until the time comes to commit the transaction, at which point do it in one go.”
Does SQLite have anything like this? Is it possible to get this behaviour while using SQLite as is?
why are you not doing
ON CONFLIT DO IGNORE ?
I can't (this would produce wrong data when not in create-from-scratch mode). But even if I could, it would still result in long runtime because the index would still need to be updated continuously.
I've found an old StackOverflow thread about a very similar usecase. I'm inclined to try the approach suggested there (dropping and recreating an index). I'll need some generic code to remove the potential duplicates. But maybe there's an easier way?