Feature Request: support bulk value fetches

(1) By Tom Snyder (tsnyder) on 2020-04-28 15:21:34 [link] [source]

I have a workflow wherein there are several dozen threads performing reads and writes continuously for multiple hours. Each reader might be fetching dozens of values from each row. Concurrent readers share a single read-only connection created with full mutex and private cache options (SQLITE_OPEN_FULLMUTEX | SQLITE_OPEN_PRIVATECACHE | SQLITE_OPEN_READONLY), and writers use a single writeable connection also created using full mutex and private cache option flags (SQLITE_OPEN_FULLMUTEX | SQLITE_OPEN_PRIVATECACHE | SQLITE_OPEN_READWRITE).

It came to my attention that every call to sqlite3_column_* results in a lock on the db connection mutex. With this many field fetch operations vying for the mutex, we end up with a huge backlog behind busy ‘step’ calls, and performance can suffer. I tentatively introduced a new interface within sqlite3 which fetches all desired fields from a row under a single lock of the connection mutex. This allows each row fetch to quickly get what it needs and get out of the way.

Note that this code is based on v3.27.2 source.

/* Support fetching multiple column values under a single lock of the db mutex. */
typedef struct sqlite3_value_lookup sqlite3_value_lookup;
struct sqlite3_value_lookup
{
  int column_index;      // in:  index of the column to fetch: 0-N
  sqlite3_value *value;  // out: the value retrieved
};
SQLITE_API void sqlite3_bulk_values(sqlite3_stmt*, sqlite3_value_lookup *columnData, int columnCount);
 
 
/*
** Fetch multiple column values under a single lock of the db mutex.
** This is akin to using an array to hold multiple sqlite3_column_value results.
*/
SQLITE_API void sqlite3_bulk_values(sqlite3_stmt *pStmt, sqlite3_value_lookup *columnData, int columnCount)
{
  Vdbe *pVm;
  Mem *pOut;
  int itr;
  int i;
 
  pVm = (Vdbe *)pStmt;
  assert( pVm );
  assert( pVm->db );
  sqlite3_mutex_enter(pVm->db->mutex);
  for( itr = 0; itr < columnCount; ++itr ){
    i = columnData[itr].column_index;
    if( pVm->pResultSet!=0 && i<pVm->nResColumn && i>=0 ){
      pOut = &pVm->pResultSet[i];
    }else{
      sqlite3Error(pVm->db, SQLITE_RANGE);
      pOut = (Mem*)columnNullValue();
    }
    if( pOut->flags&MEM_Static ){
      pOut->flags &= ~MEM_Static;
      pOut->flags |= MEM_Ephem;
    }
    columnData[itr].value = (sqlite3_value *)pOut;
  }
  sqlite3_mutex_leave(pVm->db->mutex);
}

Whether or not this is suitable as a new feature, is there anything wrong with this approach? Is there anything I’m overlooking?

(2) By Larry Brasfield (LarryBrasfield) on 2020-04-28 15:53:07 in reply to 1 [link] [source]

Your "bulk value fetche[r]" is handing out "unprotected sqlite3_value" objects. (See 3rd paragraph at link.) Per the 4th paragraph, these objects have limited usage in other API's, and those usages, presumably to be made by the multitude of threads you wish to not individually fetch column values anymore, are likely to induce the same sort of mutex locking/unlocking bottleneck you seek to avoid. Maybe if that bulk-fetch was bothering to make copies of values that can auto-magically change form under the covers, that congestion might be avoided, at least at the outset.

(3) By Keith Medcalf (kmedcalf) on 2020-04-28 16:12:46 in reply to 1 [link] [source]

Why not just stop sharing connections across threads and use SQLITE_OPEN_NOMUTEX?

(4) By ddevienne on 2020-04-28 16:35:38 in reply to 1 [link] [source]

Hi. You haven't given us actual performance numbers before/after your change,
except via the vague quickly get. Care to share those please? 10% faster, vs 10x faster,
would change the picture :)

Also, on top of what Larry and Keith mentioned, why not do your own locking,
at a less granular level, if SQLite's own locking is too granular for you?
(And use SQLite w/o internal mutexing then of course)

(5) By Larry Brasfield (LarryBrasfield) on 2020-04-28 19:58:34 in reply to 1 [link] [source]

Another few considerations:

What is the lifetime of the objects being returned (effectively as out parameters)?
Are all threads that contend with each other getting values now going after the same row of results?
Would not a caching scheme accomplish what you are after, without need of implementing a new API?

(6) By Tom Snyder (tsnyder) on 2020-04-29 15:43:29 in reply to 5 [source]

Thanks to all who shared corrections, alternatives and/or interest to this point. I'd like to share some details of usage, as that might clarify things a bit and limit which options might be feasible for me (or others) going forward.

In my code, there is only one writeable connection in use at a time, and only one thread can be making use of that connection at a time. When a thread is interested in performing a write operation (potentially in concert with accompanying read operations), it must await its turn for the writeable connection.
In my code, all read-only operations employ the same read-only connection. When a transaction is committed on the writeable connection, that 'invalidates' the future use of the current read-only connection (as it is now stale), another read-only connection is made available, and any new read-only operation will employ the 'fresh' read-only connection.
We don't want to offer a separate connection for each thread, as we could potentially have a hundred parallel reader threads. Each connection requires its own chunk of memory for state and cache, and that doesn't scale well into the hundreds.
Given that 'writer' threads have exclusive rights to the sole writeable connection, it doesn't seem possible that field data would change other than through that thread's actions. That doesn't suggest that we can be careless (Larry's comment about my scheme doling out unprotected objects is certainly worth heeding, so that is much appreciated).
Given that 'reader' threads are using a read-only connection, it doesn't seem possible that field data would change under their feet. But ditto the previous bullet about taking care regardless. However, I'd be curious to know what actions can invalidate the unprotected values returned from the read-only connection in my design.

Regarding the valid question from ddevienne about performance, the results are varied based on the particular workload, so it's hard to give concrete numbers for all situations. I believe the benefit has much to do with the comparative number of values being fetched from each row by parallel threads. Perhaps an extreme example will illustrate how certain threads might be unduly hampered. This specific example is a mental exercise, and not one that I have actually executed (yet).

Let's assume we have one thread that is iterating over 10,000 rows in some table, and fetching one value from each row. Its pattern of operation would be to alternate between a 'step' call ('s') and a 'fetch' call ('f'):

sfsf...

Let's assume we have another thread that is iterating over 10,000 rows in some other table, and fetching 19 values from each row. Its pattern of operation would be 'step' ('S') and 19 'fetch' calls (each fetch being 'F'):

SFFFFFFFFFFFFFFFFFFFSFFFFFFFFFFFFFFFFFFF...

Each of those 'step' and 'fetch' calls requires a lock on the same connection mutex. The calls of those parallel threads would likely be interleaved as:

SsFfFsFfFsFfFsFfFsFfFsFfFsFfFsFfFsFfFsFfSs...

So we only get 1 'step' in the 19-field table for every 10 'steps' of the 1-field table. If we could instead group the 19 field fetches under a single mutex lock, the 19-field fetcher wouldn't be at a disadvantage, and each of the threads could 'step' at roughly the same rate:

Ss(F19)fSs(F19)f...

So with the existing locking scheme in the library, the 1-field fetcher gets about 91% of the 'step' calls, and the 19-field fetcher gets about 9% of the 'step' calls (he's walking one-tenth as fast as the 1-field fetcher). With the proposed locking scheme, both fetchers would get about 50% of the 'step' calls. It eliminates the gross inequity in per-thread performance based simply on the number of values being fetched.

In my actual tests, some rows are fetching 30+ values, and those 'wider' fetches are far more likely to bump into contention from the other fetchers. Either reducing the number of values fetched (where possible) or changing the 'fetch' locking scheme are effective at improving the performance of those particular iterations.

Regarding the suggestion from ddevienne about using my own locking, that could work. But I think I'd first need to identify every sqlite call that would be doing locking if I allowed it (such as prepare, bind, step, reset, clear-bindings, finalize, checkpoint, etc.), and apply my own locking around each one of those calls. I think I'd prefer to rely on sqlite for that as much as possible, to avoid overlooking something.

Answers to the latest questions from Larry:

The lifetime of the objects being returned is very brief. Shortly after being returned from sqlite, the integer/double/text values are copied into my own objects.
Each of the dozens of reader threads could be reading from different tables, or reading different rows from the same table. There is a very slight possibility of overlap in what they're scanning at a given time, but it's not very likely.
I take advantage of sqlite's per-connection read cache, but can't really afford to maintain my own cache of data due to the huge amount of data in the database. I have potentially tens of thousands of rows in hundreds of tables.

I think I'll heed Larry's initial guidance and make copies of the values at the lowest level (perhaps using 'dup' on the sqlite3_value objects) such that they are safe to use. I'll then repost for more feedback.

Again, even if this doesn't turn out to be a suitable extension to the library, I appreciate the feedback and suggestions.

(7.1) By Larry Brasfield (LarryBrasfield) on 2020-04-29 17:02:45 edited from 7.0 in reply to 6 [link] [source]

Your assertion,

Given that 'reader' threads are using a read-only connection, it doesn't seem possible that field data would change under their feet.

, reinforces a concern that I only obliquely mentioned before. In fact, that "field data" can change. It changes as each row is fetched by step operations on any reading thread. It changes when text of differing forms is requested via the sqlite3_column_value APIs. Until it is copied out, that data belongs to the prepared statement, which handles its memory management and alteration. If you read through the doc for those APIs (at the link), midway down under "Warning:", you will see: "Note that when type conversions occur, pointers returned by prior calls to sqlite3_column_blob(), sqlite3_column_text(), and/or sqlite3_column_text16() may be invalidated." That is why I mentioned "[making] copies of values that can auto-magically change form under the covers" and later asked about lifetimes.

(8) By Tom Snyder (tsnyder) on 2020-04-29 20:40:41 in reply to 7.1 [link] [source]

Thank you. I think that all makes sense to me now. But I have some additional points regarding my use of statements and values.

In my code, a prepared statement is only used by a single thread. If my thread calls 'step' and starts fetching values, how will another 'step' be invoked which might alter a value? I'll stop referencing fetched values way before I invoke 'step' again.
In my code, each value is only fetched one time per step. The invoking thread will retrieve the int64/double/text value, make a deep copy of it, then never fetch it again. If a value is only ever fetched once following a 'step', is there still a risk that its contents will auto-magically change?
In my code, statement text is generated programmatically, and every column's type is compiled into my code. A given value for a given table will always be written and fetched as the same type (either int64, real or text). My code does not initiate any type conversions. There should be no way my code would ever write a value out as an integer and try to read it in as text, or write it out as double and try to read it in as integer.

Prior to my proposed interface, my code was invoking sqlite3_column_value to get a sqlite3_value* (yes, unprotected), invoking sqlite3_value_type to ensure it indeed held the expected data type, and then invoking sqlite3_value_int64/double/text on that to obtain the appropriate underlying data. I understand (per you and the documentation) that this is a no-no, invoking functions that require a protected object with what was obtained as an unprotected object. I'm just wondering how this has been working fine for me for a long time. Is my particular usage of the APIs not tripping any of the problem cases, even though it might be unorthodox, ill-advised and dangerous?

I can appreciate that my proposal is not thread-safe in the general case, and would not be acceptable as-is for public consumption. I'm just wondering whether, with all the caveats I mentioned above within my personal codebase, I am still susceptible to the lurking issues you raised. I will certainly alter my course to something that doesn't violate the sqlite APIs, I just wanted to know how much I've been living in danger. Thanks.

(9) By Neal (_Neal_) on 2020-04-29 20:46:21 in reply to 6 [link] [source]

and each of the threads could 'step' at roughly the same rate:

Here is an idea to consider. If the goal is to even out the progress of threads (rows-processed) by increasing the length of 'F'-runs, simply bumping up the active-thread's priority at 'S' and then reducing it again at last 'F' might work.

Caveats: not portable, will not cut the mutex acquisition/release overhead, cannot 100% guarantee all 'F's for a 'S' will be done in one thread-run.

(10) By Larry Brasfield (LarryBrasfield) on 2020-04-29 21:45:50 in reply to 8 [link] [source]

Responding to your points in order:

In my code, a prepared statement is only used by a single thread. If my thread calls 'step' and starts fetching values, how will another 'step' be invoked which might alter a value? I'll stop referencing fetched values way before I invoke 'step' again.

That was not apparent until just now. No thread without its own reference to the statement object can do stepping with it, nor get result column values. (This is apparent from the API together with the assumption that SQLite does not do much crazy-making.)

In my code, each value is only fetched one time per step. The invoking thread will retrieve the int64/double/text value, make a deep copy of it, then never fetch it again. If a value is only ever fetched once following a 'step', is there still a risk that its contents will auto-magically change?

Depending on what was stored versus what type of value was asked for, there may be some auto-magical change but only upon that first access. This, too, puts your machinations in a better light.

In my code, statement text is generated programmatically, and every column's type is compiled into my code. A given value for a given table will always be written and fetched as the same type (either int64, real or text). My code does not initiate any type conversions. There should be no way my code would ever write a value out as an integer and try to read it in as text, or write it out as double and try to read it in as integer.

Well, so far so good. And if that "text" is always stored with the same encoding as what you request, and no blobs become strings or vice-versa, I would even agree with your "does not initiate any type conversions" assertion.

... this is a no-no, ... wondering how this has been working fine for me

I believe that if you only rely on the guarantees that are part of the published API, then they will be met for a very long time (or soon made to be met if a bug is found whereby the code is shown to fail any guarantees.) However, when you rely on "undefined" behavior, (meaning not defined by the interface documentation), then it might work. Then, with some change in the way memory management is done, for time or space efficiency reasons, it could stop working. Fine today does not imply fine as the library evolves, even when it continues to honor past interface promises. (Sorry if this is too obvious to merit mention.)

Personally, I have seen scads of bugs created by reliance upon undocumented features and behaviors of code with a well defined API. I consider it to be living in danger too much for my taste. But if you statically link the SQLite library and test each new release of your application that uses a later version of the library to become reasonably sure the not-promised but once-evident behaviors continue, then you could justify "reliance" on them.