SQLite Forum

TSAN reports data races reading and writing WAL header
Login

TSAN reports data races reading and writing WAL header

(1.1) By Alex (alexfusco5) on 2020-05-18 15:41:22 edited from 1.0 [link] [source]

This post is intended to be a bug report against SQLite. Since bug reports are not accepted from the general public, I am posting the report on the forum as per instructions.

When using SQLite in multi-threaded mode on a database with a WAL in a manner consistent with the documentation on multi-threaded use of SQLite, running under TSAN reports a few data races.

I have written a small test program that demonstrates the issue, which I will put at the bottom of this post, along with compilation flags necessary to repro the issue.

Data races reading/writing the WAL header

This is the stack traces for one variant of the data race that are causing persistent TSAN issues when running tests using SQLite in multi-threaded mode, all variants are triggered by the algorithm described here.

WARNING: ThreadSanitizer: data race (pid=140594)
  Read of size 8 at 0x7fef22e94000 by thread T2:
    #0 memcpy <null> (cause_race+0x434067)
    #1 walIndexTryHdr /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:61447:3 (cause_race+0x5073c5)
    #2 walIndexReadHdr /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:61532:21 (cause_race+0x506402)
    #3 walTryBeginRead /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:61854:12 (cause_race+0x5059ba)
    #4 sqlite3WalBeginReadTransaction /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:62145:10 (cause_race+0x51e2dc)
    #5 pagerBeginReadTransaction /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:54762:8 (cause_race+0x51df1a)
    #6 sqlite3PagerSharedLock /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:56937:10 (cause_race+0x51d21b)
    #7 lockBtree /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:67349:8 (cause_race+0x51b18c)
    #8 sqlite3BtreeBeginTrans /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:67737:47 (cause_race+0x4c8dc8)
    #9 sqlite3VdbeExec /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:88588:10 (cause_race+0x5428b8)
    #10 sqlite3Step /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:83428:10 (cause_race+0x4d10a8)
    #11 sqlite3_step /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:83493:16 (cause_race+0x4d082f)
    #12 (anonymous namespace)::ExecuteQuery(sqlite3*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/local/google/home/alexfusco/data_race/sqlite_tsan_test.cc:23:14 (cause_race+0x68f00c)
    #13 (anonymous namespace)::KeepRunningQuery(std::atomic<bool> const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/local/google/home/alexfusco/data_race/sqlite_tsan_test.cc:44:11 (cause_race+0x68f261)
    < Stack snipped for space >

  Previous write of size 8 at 0x7fef22e94000 by thread T1:
    #0 memcpy <null> (cause_race+0x434067)
    #1 walIndexWriteHdr /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:59983:3 (cause_race+0x50541f)
    #2 sqlite3WalFrames /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:62919:7 (cause_race+0x50403a)
    #3 pagerWalFrames /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:54721:8 (cause_race+0x4ff51e)
    #4 sqlite3PagerCommitPhaseOne /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:57984:14 (cause_race+0x4cac5a)
    #5 sqlite3BtreeCommitPhaseOne /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:68292:10 (cause_race+0x4cb62d)
    #6 vdbeCommit /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:80208:14 (cause_race+0x52b7ff)
    #7 sqlite3VdbeHalt /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:80610:16 (cause_race+0x529a2c)
    #8 sqlite3VdbeExec /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:86187:8 (cause_race+0x537818)
    #9 sqlite3Step /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:83428:10 (cause_race+0x4d10a8)
    #10 sqlite3_step /usr/local/google/home/alexfusco/data_race/sqlite3/sqlite3.c:83493:16 (cause_race+0x4d082f)
    #11 (anonymous namespace)::ExecuteQuery(sqlite3*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/local/google/home/alexfusco/data_race/sqlite_tsan_test.cc:23:14 (cause_race+0x68f00c)
    #12 (anonymous namespace)::KeepRunningQuery(std::atomic<bool> const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) /usr/local/google/home/alexfusco/data_race/sqlite_tsan_test.cc:44:11 (cause_race+0x68f261)
   < Stack snipped for space >

Other data races that seem like they have a easy fix

I wrote the test program to demonstrate the data race with the above stacks, as that's the data race causing issues when running our tests under TSAN. However, I consistently got TSAN warnings for a few other data races but only from the test program. These seemed to me like they could be easily fixed by using AtomicLoad/AtomicStore. The easy fixes I suggest are only fixing the races pointed out by running the test program under TSAN, so they may not address all causes of the data races. (Using the github mirror for links, since I can't seem to link to specific lines on sqlite.org's source viewer).

pInfo->aReadMark

Most references to pInfo->aReadMark already use AtomicLoad/AtomicStore. These 2 do not, but probably should for consistency.

sLoc.aHash

No references to this appear to use AtomicLoad or AtomicStore at this time. Empirically without defining SQLITE_ENABLE_EXPENSIVE_ASSERT, these are the places use of AtomicLoad and AtomicStore needed to be added in order to silence the warnings.

Test program

Since I was unable to attach files, I'll put the small program in-line below.

Test case

#include <stdlib.h>
#include <unistd.h>

#include <atomic>
#include <chrono>
#include <iostream>
#include <string>
#include <thread>

#include <sqlite3.h>

namespace {

int ExecuteQuery(sqlite3* handle, const std::string& query) {
  sqlite3_stmt* statement_handle;
  int result = sqlite3_prepare_v2(
      handle, query.c_str(), static_cast<int>(query.size()), &statement_handle,
      nullptr);
  if (result != SQLITE_OK) {
    return result;
  }
  do {
    result = sqlite3_step(statement_handle);
    if (result != SQLITE_ROW && result != SQLITE_DONE) {
      return result;
    }
  } while (result == SQLITE_ROW);
  return SQLITE_OK;
}


void KeepRunningQuery(const std::atomic<bool>* stop_ptr, const std::string& db_path, const std::string& query) {
  sqlite3* handle = nullptr;
  int ret = sqlite3_open(db_path.c_str(), &handle);
  if (ret != SQLITE_OK) {
    abort();
  }
  ret = ExecuteQuery(handle, "PRAGMA synchronous = OFF");
  if (ret != SQLITE_OK) {
    abort();
  }
  const std::atomic<bool>& stop = *stop_ptr;
  while (!stop) {
    ret = ExecuteQuery(handle,
                       query.c_str());
    if (ret != SQLITE_OK) {
      abort();
    }
  }
  sqlite3_close(handle);
}

}  // namespace

int main(int argc, char* argv[]) {
  int ret = sqlite3_config(SQLITE_CONFIG_MULTITHREAD);
  if (ret != SQLITE_OK) {
    return ret;
  }
  std::string tmp_dir = "/tmp/sqlite_test_XXXXXX";
  if (!mkdtemp(tmp_dir.data())) {
    return errno;
  }
  std::string path = tmp_dir + "/test.db";
  std::cout << "db_path " << path << std::endl;
  sqlite3* handle = nullptr;
  ret = sqlite3_open(path.c_str(), &handle);
  if (ret != SQLITE_OK) {
    std::cout << "Open failed with " << ret << std::endl;
    return ret;
  }
  ret = ExecuteQuery(handle, "PRAGMA journal_mode = WAL");
  if (ret != SQLITE_OK) {
    std::cout << "Enable WAL failed with " << ret << std::endl;
    return ret;
  }
  ret = ExecuteQuery(
      handle, "CREATE TABLE test_table (something TEXT, other TEXT)");
  if (ret != SQLITE_OK) {
    std::cout << "Create TABLE failed with " << ret << std::endl;
    return ret;
  }
  sqlite3_close(handle);
  std::atomic<bool> stop = false;
  std::thread t1(&KeepRunningQuery, &stop, path, "INSERT INTO test_table (something, other) VALUES (\"text1\", \"text2\")");
  std::thread t2(&KeepRunningQuery, &stop, path, "SELECT something, other FROM test_table");
  std::this_thread::sleep_for(std::chrono::seconds(5));
  stop = true;
  t1.join();
  t2.join();

  return 0;
}

Build instructions

  • Place the sqlite amalgamation files sqlite3.c and sqlite3.h in a folder sqlite3.
  • Place the test case above in a file called sqlite_tsan_test.cc in the directory containing sqlite3 From that directory, build using the following commands:

    clang -fsanitize=thread -g -I sqlite3 -c sqlite3/sqlite3.c
    clang -std=c++17 -stdlib=libstdc++ -fsanitize=thread -g -I sqlite3 -c sqlite_tsan_test.cc
    clang -lstdc++ -std=c++17 -fsanitize=thread sqlite3.o sqlite_tsan_test.o -o cause_race
    

  • You can then run ./cause_race to get the stack traces for the data races

(2) By Alex (alexfusco5) on 2020-05-19 15:19:56 in reply to 1.1 [source]

I noticed that drh has partially addressed this issue with this commit. Thank you for addressing these data races with AtomicLoad/AtomicStore.

In the commit linked above, I see that you tagged the remaining data race between walIndexWriteHdr and walIndexTryHdr as a TSAN false positive. Can those two functions be tagged with __attribute__((no_sanitize_thread)) to suppress these warnings as intended behavior?