16KB was faster for me, but your row data is *tiny*, mine was bigger (strings and blobs, up to 1MB per row). If you measured the time the producing side takes, versus the consuming (SQLite) side, you'll see the majority is on the SQLite side (> 99% I suspect). So creating many producers (1-per CPU) to parallelize the 1% or less of the total, is not going to help much. A single producer is enough. Plus having many producers increases contention too. W/o using a fixed-size queue, and having a producer much faster than the consumer side, you are actually accumulating too much memory in the queue, increasing allocation, etc... A fixed-size queue blocks the producer when the queue is full, and wakes it up only after the consumer processed one-or-more items. Measure peak RAM, in addition to wall time. With a fixed-size queue, the peak RAM should be tiny. While w/o one, I expect the peak RAM to be much larger, close to the full data side in your case, the producing side being so much faster. In a real world scenario, there's not such an imbalance between what SQLite must do, and what the rest of the code must do, so the SPSC approach on two threads works fine, up to a maximum of 2x faster in the perfect case. But it can't be lower than max(producer, consumer) so if your consumer is already at 99%, at most you save 1%... At over 3M rows per seconds, you're already very fast. Can't get any faster IMHO. Stephan's suggestion to use an in-memory DB is also a good one. Will give you the maximum throughput w/o the vagaries of the filesystem.