WAL File Grows Past Auto Checkpoint Limit
(1) By Alan (abarr6) on 2022-06-04 12:41:56 [link] [source]
I just wanted to check some behaviour regarding the WAL file size / auto checkpointing. I think its user error at this point, but just wanted to check my mental model.
I have a "SELECT MAX()" prepared statement which the following get called on:
- sqlite3_step()
- sqlite3_column_int()
I then proceed to run more prepared statements to modify the table I just read from, i.e.:
- BEGIN
- INSERT
- COMMIT
Its only after this explicit transaction when all the prepared statements are actually being reset.
Is it expected in this case that the WAL file size would continue to grow past the default 1000 page autocheckpoint limit (given enough repetitions of the explicit transaction)?
Performing either of the following before the INSERT transaction will keep the WAL file around the desired 4 MB / 1000 pages:
- sqlite3_step() the SELECT a second time to get SQLITE_DONE
- sqlite3_reset() the SELECT
I think this is expected due to the following snippets, where my take is an unfinished read operation has the potential to cause such problems:
A checkpoint operation takes content from the WAL file and transfers it back into the original database file. A checkpoint can run concurrently with readers, however the checkpoint must stop when it reaches a page in the WAL that is past the end mark of any current reader. https://sqlite.org/wal.html#concurrency
A statement finishes when its last cursor closes, which is guaranteed to happen when the prepared statement is reset or finalized. Some statements might "finish" for the purpose of transaction control prior to being reset or finalized, but there is no guarantee of this. The only way to ensure that a statement has "finished" is to invoke sqlite3_reset() or sqlite3_finalize() on that statement. https://www.sqlite.org/lang_transaction.html
Regarding "best practise" for prepared statements, after this I presume its recommended to reset them ASAP? Previously, I've been letting the prepared statements be taken care of by some C++ RAII, but in the above case it certainly appears this is too late.
If my interpretation is correct, I wonder (at the risk of repeating some information on the same page) would it be worth explicitly calling out this scenario somewhere in https://sqlite.org/wal.html#avoiding_excessively_large_wal_files? Currently the "Checkpoint starvation" bullet point seems to be more focused on multiple connections rather than a (careless) open read. Although, maybe this is down to me doing something fundamentally wrong/stupid in the first place.
(2) By Alan (abarr6) on 2022-06-12 15:34:09 in reply to 1 [source]
Here is my test C++ program in case it is of interest to anyone.
Note to try and further simplify things, I've removed the explicit transaction which was mentioned in the above description.
The function most of interest regarding the above behaviour is populate_db()
.
/*
* Program to demonstrate WAL file growth when there is an unfinished/open read
* operation. Best not to question the particulars of database logic itself as
* it has been created as a toy/demo program.
*
* Check stdout for "WAL file size ... This is larger than desired" which
* indicates the WAL file has grown bigger than the expected auto checkpoint
* limit.
*
* Has been tested against the following SQLite versions, both of which see the
* WAL file growing:
*
* - 3.30.0
* - 3.38.5
*
* See PREVENTION comment below for additional information.
*/
#include <cassert>
#include <exception>
#include <filesystem>
#include <iostream>
#include <sstream>
#include <string>
#include <sqlite3.h>
#define ASSERT_SQLITE(func) \
{ \
int const rtn_ = (func); \
if (rtn_ != SQLITE_OK) \
{ \
std::cout << std::endl; \
throw std::runtime_error(describe_sqlite_err(rtn_, __LINE__)); \
} \
}
std::string describe_sqlite_err(int const code, int const line)
{
std::ostringstream ss;
ss << "SQLite returned code: " << code << " on line: " << line
<< ". This is: " << sqlite3_errstr(code);
return ss.str();
}
void create_table(sqlite3 *db)
{
std::cout << "Creating table\n";
char *err_msg = NULL;
char const *const create_sql = "DROP TABLE IF EXISTS Data;"
"CREATE TABLE Data("
"PrimaryKey INTEGER PRIMARY KEY, "
"Stuff TEXT"
");";
if (sqlite3_exec(db, create_sql, 0, 0, &err_msg))
{
fprintf(stderr, "%s \n", err_msg);
sqlite3_free(err_msg);
sqlite3_close(db);
abort();
}
}
struct prepared_statement
{
prepared_statement(char const *const stmt, sqlite3 *const db)
{
ASSERT_SQLITE(sqlite3_prepare_v2(db, stmt, -1, &handle_, NULL));
}
void finalise()
{
int const err = sqlite3_finalize(handle_);
if (err != SQLITE_OK)
{
std::cout << describe_sqlite_err(err, __LINE__);
}
handle_ = NULL;
}
void step(bool const expect_row = false)
{
int const rtn = sqlite3_step(handle_);
if (expect_row)
{
if (rtn == SQLITE_ROW)
{
return;
}
else
{
std::cout << "not SQLITE_ROW! "
<< describe_sqlite_err(rtn, __LINE__) << "\n";
ASSERT_SQLITE(rtn);
}
}
if (rtn != SQLITE_DONE)
{
std::cout << "not SQLITE_DONE! "
<< describe_sqlite_err(rtn, __LINE__) << "\n";
ASSERT_SQLITE(rtn);
}
}
~prepared_statement()
{
finalise();
}
sqlite3_stmt *handle()
{
assert(handle_);
return handle_;
}
prepared_statement(const prepared_statement &) = delete;
prepared_statement &operator=(const prepared_statement &) = delete;
private:
sqlite3_stmt *handle_;
};
void errorLogCallback(void *pArg, int const iErrCode, const char *const zMsg)
{
std::cerr << "Error Code:" << iErrCode << " Message: " << zMsg << "\n";
}
void set_wal_mode(sqlite3 *const db)
{
prepared_statement wal_mode("PRAGMA journal_mode=WAL;", db);
wal_mode.step(true);
}
sqlite3 *setup(const char *const db_name)
{
sqlite3 *db = NULL;
ASSERT_SQLITE(sqlite3_open(db_name, &db));
set_wal_mode(db);
create_table(db);
return db;
}
void insert_data(int const pk, prepared_statement &inserter)
{
static std::string const big_data(512 * 1024, '*');
ASSERT_SQLITE(sqlite3_reset(inserter.handle()));
ASSERT_SQLITE(sqlite3_clear_bindings(inserter.handle()));
ASSERT_SQLITE(sqlite3_bind_int(inserter.handle(), 1, pk));
ASSERT_SQLITE(sqlite3_bind_text(
inserter.handle(), 2, big_data.c_str(), -1, SQLITE_TRANSIENT));
inserter.step();
}
void populate_db(sqlite3 *const db)
{
prepared_statement inserter(
"INSERT OR IGNORE INTO Data (PrimaryKey, Stuff) VALUES (?,?)", db);
prepared_statement max_pk_getter("SELECT MAX(PrimaryKey) FROM Data", db);
insert_data(42, inserter);
max_pk_getter.step(true);
int const max_pk = sqlite3_column_int(max_pk_getter.handle(), 0);
/*
* PREVENTION:
* Enabling either of the following will ensure the WAL file stays in check:
* - stepping again and getting SQLITE_DONE
* - a reset
*/
#if 0
max_pk_getter.step();
#endif
#if 0
ASSERT_SQLITE(sqlite3_reset(max_pk_getter.handle()));
#endif
for (int loop_ctr = max_pk; loop_ctr < max_pk + 20; loop_ctr++)
{
insert_data(loop_ctr, inserter);
}
}
void shutdown(sqlite3 *const db)
{
ASSERT_SQLITE(sqlite3_close(db));
}
void check_wal_file_size(char const *const db_name)
{
std::filesystem::path const wal_name =
std::filesystem::path(db_name).concat("-wal");
std::uintmax_t const size = std::filesystem::file_size(wal_name);
std::uintmax_t const rough_limit = 4.5 * 1024 * 1024;
char const *const description =
(size > rough_limit) ? "This is larger than desired." : "";
std::cout << "WAL file size is: " << size << " B. " << description << '\n';
}
int main()
{
char const *const db_name = "test.db";
ASSERT_SQLITE(sqlite3_config(SQLITE_CONFIG_LOG, errorLogCallback, NULL));
sqlite3 *const db = setup(db_name);
populate_db(db);
check_wal_file_size(db_name);
shutdown(db);
return 0;
}