Does the column order in an insert statement impact the insert speed?
(1) By anonymous on 2021-07-18 07:05:02
In my situation, I have a normal rowid table with 24 columns comprised of 4x INTEGER and 20x TEXT columns (so no BLOBs) with no explicitly defined extra indexes. TEXT columns vary from 10s of characters to a few 100s of characters. To populate the table, I do inserts for approx 1 million rows (with all columns being populated by a single insert using a prepared statement and transactions for each block of 10000 rows). I am trying to tune this, in order to improve the insert speed which is ok for other tables but an order of magnitude slower for this one. Will the order of the columns in the insert statement have any impact on the insert speed? (Maybe based on how the SQLite code is written it will be faster to have the columns in the same order as they are defined in the table definition?) What other SQLite suggestions can people give to improve the insert performance so that I can focus on checking these. Many thanks in advance for your constructive thoughts and ideas.
(2) By Simon Slavin (slavin) on 2021-07-18 13:10:55 in reply to 1 [link]
Your transaction size seems reasonable. you might experiment with 2000 rows and 50000 rows just to see if either is an improvement. The fact that some values are TEXT and some are INTEGER doesn't matter. SQLite doesn't use fixed-sized data for either of them. But it was definitely good to mention it in your question. SQLite stores all the values for a row together, in the order in which the columns are defined. So it has to construct a sequence of values for each row. I believe that it's fastest to construct the sequence in column order, which should mean that it's fastest to supply the values in column order in the INSERT command. However, it's possible that SQLite uses a linked list, in which case it might be a little faster to supply the values in reverse column order. If you're getting obsessive about speed, you might try this out. But by and large SQLite execution time is almost always storage-bound. SQLite's processing is extremely optimised and very fast. But most of the time taken by INSERT is taken waiting for your storage subsystem to process reads and writes. Worry less about syntax and more about minimising the number of calls to storage. One possibility, if this million-row table is useless unless complete, is to set one or more PRAGMAs to ignore ACID and change security while the table is being constructed. Then you can set them back afterwards. These two PRAGMAs might change things (and they might not, depending on your setup !) <https://sqlite.org/pragma.html#pragma_synchronous> <https://sqlite.org/pragma.html#pragma_wal_autocheckpoint> For each PRAGMA, find out what the setting is before changing it, and change it back when you've done your million rows. Test that out, and find out if it speeds things up. If it doesn't, don't do it.
(3) By Simon Slavin (slavin) on 2021-07-18 13:45:07 in reply to 2 [link]
By a wild coincidence, someone blogged discussing a similar question today. They discuss a number of factors I hadn't thought of, including using prepared statements rather than raw SQL commands. Here it is: <https://avi.im/blag/2021/fast-sqlite-inserts/>
(4) By Keith Medcalf (kmedcalf) on 2021-07-18 20:38:41 in reply to 1 [link]
> Will the order of the columns in the insert statement have any impact on the insert speed? (Maybe based on how the SQLite code is written it will be faster to have the columns in the same order as they are defined in the table definition?) No. This can make no possible difference. Either the order of the fields is the default or it is specified by the INSERT statement -- in either case the order is known at prepare time. It is also known at prepare time what a "row" to go into the "table" looks like. So, no matter what, the prepare process takes the data that you provide in the order that you provide and stores it where it belongs. It makes no difference the ordering of the data items since the ordering is not dynamic -- it is determined for that statement forever when the statement is prepared and does not and cannot change for that statement after that instant. If you want a statement with a different ordering, you have to prepare that other statement which will always and forever deal only with the specific ordering it specifies and no others. Notwithstanding, however, the total length of the record does matter. If the single record will fit on one page, then only one page needs to be read/written when it is accessed. However, if it does not fit on a page, then part of it must reside on an "overflow page" and when the record is read/written/updated then *all* the pages including the overflow pages must be read and written. If 50% of the records occupy two pages (have a one page overflow) then "accessing" N records requires "accessing" 1.5N pages which will take *at least* half-as-long-again as accessing all N records where there was no overflow.