SQLite Forum

Speed optimization for big single tabel searches
Login

Speed optimization for big single tabel searches

(1) By andy (andytolle) on 2020-04-14 14:42:08 [source]

I have a very simple table 'logs' with just two fields: 
login: text
log: text

(no auto increment id, no indexes yet)

The database is used by one person, just to do searches. It's loaded with new information in big batches, so it's not a lot of small writes, so on the writing part speed isn't a big concern.
The database is quite big: 522 million records and about 24 gigs. 
Given the file size and the need to be portable, it's located on an external USB drive.

I selected sqlite for it's simplicity and because the database itself is so simple. Also, the docs say it should be able to handle databases of about a terra byte quite fine and speed wasn't my biggest concern... however... 

Currently running a query like 
"SELECT * FROM logs WHERE login like "%andy%";" 
takes about 4 minutes to complete.
I have played around with indexed, but most I seem to be able to shave of is a few seconds from the queries, so I guess I'm doing something wrong... 

The question is: what can I do to improve speed of such queries? And what are realistic speeds to expect from a database this size?

All help is appreciated! 

Thanks in advance

(2) By David Raymond (dvdraymond) on 2020-04-14 15:02:33 in reply to 1 [link] [source]

If your searches were looking for a login that started with "andy", then you could make the column collate nocase, and then use a normal index on it so it could be used with like. But if you are looking for something in the middle of the field, then normal indexes aren't going to cut it, and it will always need a full table scan.

You might be able to make use of the FTS extension(s) if you want to search on words in the field. But if you're possibly looking for any random substring, then I don't think that's something that can reasonably be indexed.

(3) By andy (andytolle) on 2020-04-14 17:56:15 in reply to 2 [link] [source]

David,

Thank you very much for the clarification. It now makes sense why the indexing I tried didn't make a difference. I'm very glad I understand now. Thank you very much!

Regards, Andy

(4) By ddevienne on 2020-04-15 08:16:25 in reply to 1 [link] [source]

Given

SELECT * FROM logs WHERE login like '%andy%'

something you may not know is LIKE is case-insensitive by default.

So when you index login, without David's suggestion to collate nocase
that index cannot be used. An alternative is to force LIKE to be case-sensitive
(there's a PRAGMA for that), or use GLOB (see below).

Also as David mentioned, indexes can be used if what you search for
is equal or a prefix of what is indexed. So use login = 'andy'
or login GLOB 'andy*' (GLOB and LIKE are similar, just using different
syntax, and GLOB is case-sensitive) can use the index, while
login LIKE 'andy%' or login GLOB '%andy%' cannot.

PS: Don't use double-quotes for string literal, use single-quotes instead.