SQLite User Forum

WASM: query with rowMode object is slow
Login

WASM: query with rowMode object is slow

(1) By franztesca on 2024-04-04 20:45:59 [source]

While working with SQLite/WASM we realized that the performance of querying the database with

rowMode: 'object',
returnValue: 'resultRows',
is much worse compared to using
rowMode: 'array',
returnValue: 'resultRows',
I modified this SQLite/WASM demo to have a rough estimation of the difference. On my mac M2, the same query takes around 150ms on average with array mode, and 300ms with object mode(!). By profiling the two cases, it seems that object mode takes much longer because of the string decoding of the column names, with a lot of target.cstrToJs and other functions that seem to be invoked for each row.

Obviously the object rowMode has the additional cost of extracting the names of the columns, but I would expect this to be computed more efficiently, maybe just once?

Here is the modified demo, which also contains the Chrome profiler trace. (here is where to modify to change rowMode).

We currently employ an ugly hack where we pass the names of columns from the query caller in order to avoid the cost and still build column: value objects. We would like to avoid this if possible.

Thank you!

(2) By Stephan Beal (stephan) on 2024-04-04 21:43:32 in reply to 1 [link] [source]

Obviously the object rowMode has the additional cost of extracting the names of the columns, but I would expect this to be computed more efficiently, maybe just once?

My assumption is that you're talking about calculating/converting (from native to JS) the column names only a single time, rather than performing the conversion on each row.

As a workaround, you can use a combination of the rowMode:'array' and columnNames:[] options to exec(), which will cause it to return the result rows as arrays but also include the column names (computed a single time), in their proper order, in the given columnNames array. That can then be used to construct the objects out of that information when the query returns. Despite the extra level of conversion from array to object, that might (might!) be faster than converting the column names from native to JS for each row.

The current code structure, because of how generic rowMode is, cannot accommodate that with a quick fix. The callback which converts the stmt object's data into the desired rowMode type does not have enough information to do that and doesn't know whether it will be called 1 time or 1000000 times.

i will look into how that callback might be rewritten from its current one-line implementation into something which can cache and reuse the column names, but the only option which currently looks API-feasible would be an internal transformation from array to object very much like the one described above, so it might (might) not be much faster. Only experimentation will answer that.

i have your demo running locally and will report back, probably sometime in the next couple of days.

(3) By Stephan Beal (stephan) on 2024-04-04 21:54:05 in reply to 2 [link] [source]

As a workaround, you can use a combination of the rowMode:'array' and columnNames:[] options to exec(),

That workaround looks like (using your demo as the basis):

    // replace with 'object' to see difference in performance
    const rowMode = 'array';
    const columnNames = [];
    const result = db.exec({
      sql: 'SELECT * FROM t;',
      rowMode, columnNames,
      returnValue: 'resultRows'
    });
    if( true && 'array'===rowMode ){ // enable/disable workaround
      for( const i in result ){
        const obj = Object.create(null);
        for( const c in columnNames ) obj[columnNames[c]] = result[i][c];
        result[i] = obj;
      }
    }

preliminary tests show it to be negligibly slower than the initial rowMode='array' case and roughly twice as fast as the rowMode='object' case.

Timings on my desktop machine:

  • rowMode=array: 460ms
  • rowMode=array, converted to objects as shown above: 475ms
  • rowMode=object: 950ms

My recommendation is, for the time being, to use that approach to fetch the results as objects.

For other folks following along: unless you're using large data sets, the performance difference falls into the level of background noise, so the above is not a blanket recommendation for all clients.

(4) By franztesca on 2024-04-04 22:01:46 in reply to 2 [link] [source]

My assumption is that you're talking about calculating/converting (from native to JS) the column names only a single time, rather than performing the conversion on each row.

Yes.

As a workaround, you can use a combination of the rowMode:'array' and columnNames:[] options to exec()

This is actually a very good solution. It seems that id adds just a few milliseconds, including the construction of the objects from the column names and rows (we are around 155-60ms, compared to 300ms of rowMode: object, and 150 of rowMode: array). We'll use this for now.

Thanks a lot again!

(5) By Stephan Beal (stephan) on 2024-04-04 22:59:20 in reply to 2 [link] [source]

i have your demo running locally and will report back, probably sometime in the next couple of days.

This wasn't quite as invasive as it initially appeared. It is now in the trunk but, because it's not a bug fix (per se), won't be back-ported to the 3.45 branch.

The pre-release snapshot has been updated. In order to get it to run with your demo i had to copy over all of the JS/WASM files, including sqlite3.wasm, because the Emscripten-provided symbols differ between your 3.44 build and what Emscripten built today.

Thank you for the feedback and the easy-to-test reproduction case.