SQLite Forum

JOINs with FTS5 virtual tables are very slow
Login
I had submitted the following question a few days ago to which Dan Kennedy very kindly replied. But the problem wasn't really resolved so I am resubmitting it. This time I am trying to be less clever and not submitting pseudo-code. Instead, these are actual queries that I ran about 5 mins ago. Here goes.

The following query took 170s and returned 10 rows. Let's call this Query 1

## Query 1

```sql
SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
	JOIN treatments ON materialsCitations.treatmentId = treatments.treatmentId
	JOIN vtreatments ON treatments.treatmentId = vtreatments.treatmentId
WHERE 
	collectionCode != ''
	AND materialsCitations.deleted = 0 
	AND treatments.deleted = 0
	AND vtreatments MATCH "carabus"
GROUP 
	BY collectionCode;

-- QUERY PLAN
|--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
|--SEARCH TABLE treatments USING COVERING INDEX ix_treatments_treatmentId (deleted=? AND treatmentId=?)
`--USE TEMP B-TREE FOR GROUP BY
```

Let's break it down into sub-queries. The following takes 668ms

## Query 2

```sql
SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
    JOIN treatments ON materialsCitations.treatmentId = treatments.treatmentId
WHERE 
	collectionCode != ''
	AND materialsCitations.deleted = 0 
    AND treatments.deleted = 0
GROUP 
	BY collectionCode;

-- QUERY PLAN
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
`--SEARCH TABLE treatments USING COVERING INDEX ix_treatments_treatmentId (deleted=? AND treatmentId=?)
```

The following takes 37ms

## Query 3

```sql
SELECT treatmentId FROM vtreatments WHERE vtreatments MATCH "carabus";

-- QUERY PLAN
 SCAN TABLE vtreatments VIRTUAL TABLE INDEX 131073:
```


Now, let's put them together. The following query takes 439ms and also returns 10 rows. As one can see, **Query 1** is more than 380 times slower than **Query 4** even though producing identical results.

## Query 4

```sql
SELECT 
	collectionCode, 
	Count(collectionCode) AS c 
FROM 
	materialsCitations
	JOIN (SELECT * FROM treatments WHERE treatmentId IN 
            (SELECT treatmentId FROM vtreatments WHERE vtreatments MATCH "carabus")) t
            ON materialsCitations.treatmentId = t.treatmentId
WHERE 
 	collectionCode != ''
	AND materialsCitations.deleted = 0 
	AND t.deleted = 0
GROUP 
	BY collectionCode;

--QUERY PLAN
|--SEARCH TABLE materialsCitations USING INDEX ix_materialsCitations_collectionCode (deleted=? AND deleted=?)
|--LIST SUBQUERY 1
|  `--SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE treatments USING INDEX sqlite_autoindex_treatments_1 (treatmentId=?)
`--LIST SUBQUERY 1
   --SCAN TABLE vtreatments VIRTUAL TABLE INDEX 0:m
```

Consistently, my experience is that when I JOIN a virtual table to a normal table, the query is not longer performant. Is that expected or can I do something about it to make such a JOIN efficient? If yes, then I will try to change my program so it creates queries of the type **Query 4**, although given the unpredictability of what params the users might submit, it is going to be very tricky.

A related observation is that querying a virtual table is usually very fast *unless* the query finds a lot of results. I don't have a cut-off number, but a query finding fewer than 100 rows is blindingly fast but a query finding 80K rows is very slow. And, I don't really mean "returning", I really mean "finding". That is, just returning the `Count(*)` of matching rows is very low when the count is very high. I would expect that such a count would be returned from some kind of term frequency index.