Why is this query so slow, but only in SQLite?

(1) By treeman22 on 2022-01-21 23:58:37 [source]

Hello! I'm trying to run the following query on a database that we recently transitioned from Access format to SQLite format. It's meant to be a validation rule in that the code running this will consider it a PASS only if no results are returned:

SELECT groups.grouper_name, groups.synonym_id, groups.code, groups.grouper_id, groups.country_id, groups.synonym_name, groups.code_name, groups.version 
FROM groups 
WHERE (
    (groups.grouper_name) IN (
        SELECT [grouper_name]
        FROM [groups] AS Tmp
        GROUP BY [grouper_name],[synonym_id],[code]
            HAVING Count(*) > 1
                AND [synonym_id] = [groups].[synonym_id]
                AND [code] = [groups].[code]
    )
);

This runs in about a second on the Access version of the database, but takes over an hour when running on the SQLite version. Why it that, and how can I optimize this?

The "groups" table has about 60,000 entries. Running "EXPLAIN QUERY PLAN" on the above gives the following:

id	parent	notused	detail
2	0	0	SCAN TABLE groups
5	0	0	CORRELATED LIST SUBQUERY 1
11	5	0	SCAN TABLE groups AS Tmp
13	5	0	USE TEMP B-TREE FOR GROUP BY

I know just enough about SQLite to know that B-Trees can be pretty slow compared to indices, so I tried creating one for the "groups" table before running it again but it is still taking forever. Any ideas would be much appreciated as I'm very new to SQLite, thank you!

(2) By Keith Medcalf (kmedcalf) on 2022-01-22 00:57:39 in reply to 1 [link] [source]

Unless I am misunderstanding your query (which is entirely possible), the following should work better (ie, contains less useless convolution).

SELECT grouper_name,
       synonym_id,
       code,
       grouper_id,
       country_id,
       synonym_name,
       code_name,
       version
  FROM groups as o
 WHERE (
        select sum(cnt)
          from (
                select 1 as cnt
                  from groups
                 where grouper_name == o.grouper_name
                   and synonym_id == o.synonym_id
                   and code == o.code
                 limit 2
                )
       ) > 1
;

You will need an index on groups (grouper_name, synonym_id, code).

(3) By Keith Medcalf (kmedcalf) on 2022-01-22 01:11:01 in reply to 2 [link] [source]

Note that this should get the same results. The index will need to contain the noted colums as the first three columns in the index (in any order).

The original query executes the correlated subquery for each row of the outer table and generates a plustitude of output, most of which is being uselessly generated and then checked using the IN operator -- this process would be highly inefficient.

Also of note is that 'Microsoft Access' is not a database system. It is a hooey-gui interface builder that speaks to a database engine. One can conclude by your reference to 'Microsoft Access' that you likely mean JET, which is one of the backend datastores that can be utilized by 'Microsoft Access'.

(4) By Simon Slavin (slavin) on 2022-01-22 12:14:00 in reply to 1 [link] [source]

You have not created any indexes to help speed up the query. And SQLite won't create permanent indexes itself because it doesn't know how much storage space you have to spare. Read this section:

https://sqlite.org/cli.html#index_recommendations_sqlite_expert_

Create the indexes it suggests, and run your query again.

By the way, if you haven't already used the Command Line Shell for SQLite (which is what that page is about) I recommend it. It's a vital tool for exploring your database, and learning SQLite without having to write your own program every time.

(5) By treeman22 on 2022-01-24 16:38:55 in reply to 3 [link] [source]

You're a life saver mate, thank you so much for the help and explanation!

(6) By treeman22 on 2022-01-24 16:39:17 in reply to 4 [link] [source]

Great tips, thank you!