long runtime for sqlite3_prepare_v2

(1) By anonymous on 2021-05-20 21:23:51 [link] [source]

Hi All,

We are using the C++ interface of sqlite to create and populate databases. What we have noticed is that it takes about 10-20 sec ( depending on machine - Linux server machines) to execute one sqlite3_prepare_v2() command per database.

Our questions are:

Is this expected or something that is can be done to improve it ( we are compiling our own sqlite3.
The command we are preparing is identical for many databases, is there any way to prepare once and reuse for different databases?

(2) By Larry Brasfield (larrybr) on 2021-05-20 21:51:59 in reply to 1 [link] [source]

Regarding "Is [10-20 second sqlite3_prepare_v2() execution time] expected[?]":

No expectations can reasonably exist absent some knowledge of the input. If the input is short, why not post it? If it is ridiculously long, maybe it should take awhile to compile.

Regarding "is any way to prepare once and reuse for different databases?":

Not in the present library code.

(3) By Ryan Smith (cuz) on 2021-05-20 22:21:49 in reply to 1 [link] [source]

Your question asks for informed deduction with no valid input parameters and some ill-conceived premises. Allow me this tongue-in-cheek paraphrasing to illustrate the problem:

We are using standard driver inputs to drive some cars. We have noticed that it takes about 10-20 minutes to drive (depending on the engine size of the car) down some roads.

Our questions are:

Is this expected or can something be done to improve it? (We are building the cars ourselves).
The roads we are choosing are identical for many cars, is there a way to prepare one car for a road and then have the other cars automatically drive it?

What can we possibly answer with even the smallest pretense of veracity? (except to say that option 2 is clearly impossible)

Please either post the schema and queries, or provide a lot more actual facts about the circumstances and physical queries. Any useful solution should be based in engineering rigor, not opinion, and that can only be arrived at with full knowledge of the variables/parameters of the problem.

(This is not twitter/reddit, we are actually interested in the boring details)

(4) By anonymous on 2021-05-21 08:53:39 in reply to 3 [link] [source]

Thanks both of you for replying for my questions

Fair enough. I was vague enough to deserve the comments.

Here are more details and the code which I believe pretty straightforward

We are trying to send maximum number of pairs, (one integer and one text) in one transaction. ( FVIdx, hashkey are the corresponding names of the item in the pair in the following code section) 

We use the SQLITE_LIMIT_VARIABLE_NUMBER (=32768 in our case) to create the INSERT command as following. We end up with 16383 rows.

In this case, the prepare call takes 10-20 secs depending on the machine ( we tried 5+ different machines BTW)

// 16383 rows
size_t keyRowsPerInsert = (size_t) (sqlite3_limit( pDB, SQLITE_LIMIT_VARIABLE_NUMBER, -1) / 2);
       
// INSERT command string , Note that KEYTABLE is the name of the table    
const char *tuple_sep = "";
stringstream keySqlStream;

keySqlStream << "INSERT INTO KEYTABLE (FVIdx,hashkey) VALUES ";
for (size_t idx = 0; idx < keyRowsPerInsert; ++idx) {
     keySqlStream << tuple_sep << "(@J" << idx << ", @KEY" << idx << ")";
     tuple_sep = ",";
}
keySqlStream << ";";
            
std::string keySqlStr = keySqlStream.str();

// prepare the statement
 sqlite3_prepare_v3(pDB,  keySqlStr.c_str(), -1, SQLITE_PREPARE_PERSISTENT, &feat_stmt, NULL);

(5) By Richard Hipp (drh) on 2021-05-21 11:19:37 in reply to 4 [link] [source]

SQLite is optimized for a small number of parameters in each SQL statement. The usual number in practice is one or two. SQLite is efficient with dozens.

But the performance is quadratic in the number of parameters, so when you put in 32768, it slows down. A lot. We could refactor the way in which parameters are processed to make it faster for a huge number of parameters like this, but that would slow down processing for the common case where the number of parameters is 1 or 2.

I suggest that you change your application to use just two parameters:

INSERT INTO keytable(FVIdx,hashkey) VALUES(@J,@KEY);

Then run the statement once for each J and KEY value, inside of a transaction.

(6) By Ryan Smith (cuz) on 2021-05-21 11:22:09 in reply to 4 [link] [source]

Thank you kindly, this phrasing of the question is absolutely clear and detailed.

The problem now is clear - You are preparing a single string with 16K rows, it will most certainly take ages to parse and prepare and is possibly the worst possible way to use SQLite.

I'm not fluent in C++ so the following is a bit pseudo-code-ish, but the idea should be clear:

The best way to do it would be to make the initial query so:

sql = "INSERT INTO KEYTABLE(FVIdx, hashkey) VALUES (?, ?);";

Then prepare that query:

sqlite3_prepare_v3(pDB,  sql, -1, SQLITE_PREPARE_PERSISTENT, &feat_stmt, NULL);

Then iterate your loop calling the sqlite3_bind_xxx functions every iteration and stepping the query to the next line, and finally, finalizing the query. Something like:

for (size_t idx = 0; idx < keyRowsPerInsert; ++idx) {
     sqlite3_bind_int(&feat_stmt, 1, idx);
     sqlite3_bind_int(&feat_stmt, 2, idx); // Or whatever other Key value you choose
     sqlite3_step(&feat_stmt);
}
sqlite3_finalize(&feat_stmt);

That should be faster than your current attempt by orders of magnitude.

More info on binding here

and here

sqlite3_finalize info here

Note1: I seem to recall somewhere in the docs a rather succinct demonstration of the entire prepare-bind-step-finalize loop process, but cannot seem to locate it now. The above should be close enough to get you rolling.

Note2: Your real code should involve some error checking testing return values from those calls to assess if they happened correctly or not. Those return codes are all documented.

(7) By Richard Damon (RichardDamon) on 2021-05-21 12:25:38 in reply to 4 [link] [source]

One thing to remember when your SQLite, is that a lot of the 'conventional wisdom' for using a database is just wrong. With standard client/server databases, it is important to minimize the number of statements, as each statement might involve a network transaction, so is slow. For SQLite, this doesn't hold, SQLite is just part of your program. Instead, it is better to make the statements simpler so it can parse them faster, and reuse the statement.

Maximizing the number in a TRANSACTION might make sense, but rather than make the transaction a single statement, make it an explicit transaction with a lot of statements. That way SQLite works with simpler smaller chunks and all goes well.

(8) By anonymous on 2021-05-21 15:24:54 in reply to 4 [link] [source]

Just did some tests myself to optimize a query I've got, and the fastest method I found was doing multiple inserts at once (90 rows with 2 columns, both bound) and reusing that statement.

Over 50 trials, for plain old inserts, it took about 40% less time (~0.83s down from ~1.38s) to insert ~115k rows. For insert on conflict update statements done with newer data applied on top of the old (in case it's relevant, only about 500 of the rows needed updating), the difference was even larger, with single row inserts taking ~0.87s and 90 row inserts taking ~0.35s.

It definitely varies a lot and it can take a little bit of time to find the best number of rows per execution, but it does produce consistently faster times for me.

(9) By anonymous on 2021-05-22 09:11:47 in reply to 8 [source]

Hi All,

Thanks for the recommendations and explanations. We tested with different numbers also and as expected there is a tradeoff between prep time and insert time. We settled in the same ranges you have found (10-100) range. Arnound 10-100 range the prep time was about 0.1 secs which is 200x improvement from our 20sec.

BTW, the pragma settings also helped to improve by %20 our performance.