Performance Issue: How can I increase a performance?
(1) By Artur (aaukhatov) on 2020-10-07 06:26:30 [link]
Hi guys! ## Context Little introduce to a context of the issue. I have about **33 million rows** on a table. The table structure looks like that: `(id, date_time, list_id, item_id, modified_date_time, title, json)` index: `list_id, modified_date_time` ## Issue I did select query by `WHERE clause with list_id=? AND item_id=? AND modified_date_time=?` And I've gotten a result for **150.7ms** on average, stdev: 3.59 ms, stderr: 0.804, ci 95%: 0.352. For me it's too long. I want to select a row of the table for less than **70ms**. Is it possible? Does any body help me? What I should to do?
(2) By John Dennis (jdennis) on 2020-10-07 08:36:33 in reply to 1 [link]
How many of the 33 million rows are returned by the query which took 150.7ms? Have you tried adding item_id to the index, which would save having to go back to the base table to further restrict the results. For example, there might be 10 million rows matching list_id and modified_date_time, but only a tiny percent might also match item_id.
(3) By Artur (aaukhatov) on 2020-10-07 10:03:17 in reply to 2 [link]
In my case only one row
(4) By Artur (aaukhatov) on 2020-10-07 10:05:24 in reply to 2 [link]
Yes John, I've done experiments with several indexes such as index(list_id), index(modified_date_time), index(item_id) and index(list_id, modified, item_id). And the result of the experiment the same.
(5) By Keith Medcalf (kmedcalf) on 2020-10-07 10:27:09 in reply to 4 [link]
Are there "other things" in the query in question? Have you run ANALYZE?
(9) By Artur (aaukhatov) on 2020-10-07 11:19:16 in reply to 5 [link]
No, there are not. No, I have not run ANALYZE.
(8) By John Dennis (jdennis) on 2020-10-07 10:37:13 in reply to 4 [link]
If this still took an awfully long time with an index on three conditions in the query, then that sounds a bit odd. ANALYZE would be interesting? The RDBMS I am most familiar with used a statistical based query optimizer, but there were still odd queries which didn't make sense. What is the data distribution of the three columns in the condition? What happens if you try the index(list_id, modified, item_id) with the columns in different order?
(10) By Artur (aaukhatov) on 2020-10-07 11:21:15 in reply to 8 [link]
I can create the index by different order of these fields. Are you sure can have some impact?
(22) By Simon Slavin (slavin) on 2020-10-07 13:40:07 in reply to 4 [link]
Assuming you have tying errors, the index <code>(list_id, modified, item_id)</code> should make the <code>SELECT</code> far faster. The difference between having it and not having it should be *very* noticeable. If this is not happening for you, something is wrong with your database. Using the sqlite shell tool do this: Delete all indexes. Use <code>.timer on</code>. Run the query without any indexes. Create that index. Then run <code>ANALYZE</code>. It's just a SQLite command you execute like <code>CREATE INDEX</code>. Then submit your query again and check whether the times have changed. Post the timings for us to check.
(24) By Donald Griggs (dfgriggs) on 2020-10-07 18:02:18 in reply to 22 [link]
Regarding: " It's just a SQLite command you execute like CREATE INDEX. " I feel sure Simon was thinking of "EXPLAIN". For the sqlite analyzer visit the download page: <code> <https://www.sqlite.org/download.html></code> and download the "bundle of command-line tools" for your machine, or compile from source as needed. The analyzer is a separate program you'll run from your commandline ("terminal") as follows: <code> sqlite3_analyzer filespecOfYourDatabase >MyAnalyzeOutput.txt</code> Then post here the contents of MyAnalyzeOutput.txt, or link to it from somewhere like pastebin.com. Unrelated question: The field "attributes" is labeled JSON. Is this field's contents always small?
(25) By Adrian Ho (lexfiend) on 2020-10-08 06:21:38 in reply to 24 [link]
I'm pretty sure he literally meant `ANALYZE`. From https://sqlite.org/lang_analyze.html : > The ANALYZE command gathers statistics about tables and indices and stores the collected information in internal tables of the database where the query optimizer can access the information and use it to help make better query planning choices. `sqlite_analyzer` only tells you how bloated your database is, while `ANALYZE` actually works to improve performance.
(6) By Richard Hipp (drh) on 2020-10-07 10:29:16 in reply to 1 [link]
We want to see: 1. Your complete schema - the output of the ".fullschema" command from the CLI 2. The exact text of your query. 3. The result of running your query with "EXPLAIN QUERY PLAN" prepended. 4. The output of running "sqlite3_analyzer.exe" on your database
(11.2) By Artur (aaukhatov) on 2020-10-07 12:12:06 edited from 11.1 in reply to 6 [link]
1. CREATE TABLE item ( date_time TIMESTAMP NOT NULL, item_id INTEGER NOT NULL, list_id VARCHAR NOT NULL, unique_id VARCHAR(36), object_type INTEGER, author_id INTEGER, guid VARCHAR(36), editor_id INTEGER, created TIMESTAMP, folder VARCHAR(255), type VARCHAR(128), leaf_ref VARCHAR(64), uri VARCHAR(255), content_type VARCHAR(255), file_ref VARCHAR(255), title VARCHAR(128), modified TIMESTAMP, tag INTEGER, related VARCHAR(64), redirect VARCHAR(64), parent_guid VARCHAR(36), relocated INTEGER(1) DEFAULT 0, attributes JSON, keyword VARCHAR, deleted_date_time TIMESTAMP ) 2. I've showed that 3. ``` 0|Trace|0|0|0||00| 1|String8|0|1|0|158a499e-4432-4073-a73b-9396cae17f42|00| 2|Integer|10209335|2|0||00| 3|Int64|0|3|0|1500007910000|00| 4|Goto|0|45|0||00| 5|OpenRead|0|2|0|25|00| 6|OpenRead|1|13080994|0|keyinfo(1,BINARY)|00| 7|SeekGe|1|42|1|1|00| 8|IdxGE|1|42|1|1|01| 9|IdxRowid|1|4|0||00| 10|Seek|0|4|0||00| 11|Column|0|1|5||00| 12|Ne|2|41|5|collseq(BINARY)|6c| 13|Column|0|16|6||00| 14|Ne|3|41|6|collseq(BINARY)|6b| 15|Column|0|0|8||00| 16|Column|0|1|9||00| 17|Column|1|0|10||00| 18|Column|0|3|11||00| 19|Column|0|4|12||00| 20|Column|0|5|13||00| 21|Column|0|6|14||00| 22|Column|0|7|15||00| 23|Column|0|8|16||00| 24|Column|0|9|17||00| 25|Column|0|10|18||00| 26|Column|0|11|19||00| 27|Column|0|12|20||00| 28|Column|0|13|21||00| 29|Column|0|14|22||00| 30|Column|0|15|23||00| 31|Column|0|16|24||00| 32|Column|0|17|25||00| 33|Column|0|18|26||00| 34|Column|0|19|27||00| 35|Column|0|20|28||00| 36|Column|0|21|29|0|00| 37|Column|0|22|30||00| 38|Column|0|23|31||00| 39|Column|0|24|32||00| 40|ResultRow|8|25|0||00| 41|Next|1|8|0||00| 42|Close|0|0|0||00| 43|Close|1|0|0||00| 44|Halt|0|0|0||00| 45|Transaction|0|0|0||00| 46|VerifyCookie|0|3|0||00| 47|TableLock|0|2|0|item|00| 48|Goto|0|5|0||00| ``` 4. I've run ANALYZE it didn't give some output
(7) By Wout Mertens (wmertens) on 2020-10-07 10:30:33 in reply to 1 [link]
* Did you run `ANALYZE`? * What is the result of adding `EXPLAIN QUERY PLAN ` in front of your select query? * What is the disk type, SSD I hope? * your index of `(list_id, item_id)` still means scanning for modified_date_time, how many rows does it need to scan? Did you try adding the index `(list_id, item_id, modified_date_time)`?
(12) By Artur (aaukhatov) on 2020-10-07 11:33:33 in reply to 7 [link]
- Please, help me. How I should run ANALYZE? I just do a query as ANALYZE? - I put it above - SSD - I'have 33 million rows of the table - Yes, I did. I've created the index (list_id, item_id, modified_date_time). Is the order of these fields important?
(13) By Richard Hipp (drh) on 2020-10-07 11:43:16 in reply to 12 [link]
> Is the order of these fields important? I don't know. That depends on your schema and data, which you apparently cannot show us.
(14) By Artur (aaukhatov) on 2020-10-07 12:09:19 in reply to 13 [link]
I've done an experiment with ordering these fields in INDEX. It doesn't matter.
(15) By Artur (aaukhatov) on 2020-10-07 12:12:39 in reply to 13 [link]
I have only one table ```sql CREATE TABLE item ( date_time TIMESTAMP NOT NULL, item_id INTEGER NOT NULL, list_id VARCHAR NOT NULL, unique_id VARCHAR(36), object_type INTEGER, author_id INTEGER, guid VARCHAR(36), editor_id INTEGER, created TIMESTAMP, folder VARCHAR(255), type VARCHAR(128), leaf_ref VARCHAR(64), uri VARCHAR(255), content_type VARCHAR(255), file_ref VARCHAR(255), title VARCHAR(128), modified TIMESTAMP, tag INTEGER, related VARCHAR(64), redirect VARCHAR(64), parent_guid VARCHAR(36), relocated INTEGER(1) DEFAULT 0, attributes JSON, keyword VARCHAR, deleted_date_time TIMESTAMP ) ```
(17) By Keith Medcalf (kmedcalf) on 2020-10-07 12:33:31 in reply to 15 [link]
``` SQLite version 3.34.0 2020-10-07 11:31:14 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> CREATE TABLE item ( ...> date_time TIMESTAMP NOT NULL, ...> item_id INTEGER NOT NULL, ...> list_id VARCHAR NOT NULL, ...> unique_id VARCHAR(36), ...> object_type INTEGER, ...> author_id INTEGER, ...> guid VARCHAR(36), ...> editor_id INTEGER, ...> created TIMESTAMP, ...> folder VARCHAR(255), ...> type VARCHAR(128), ...> leaf_ref VARCHAR(64), ...> uri VARCHAR(255), ...> content_type VARCHAR(255), ...> file_ref VARCHAR(255), ...> title VARCHAR(128), ...> modified TIMESTAMP, ...> tag INTEGER, ...> related VARCHAR(64), ...> redirect VARCHAR(64), ...> parent_guid VARCHAR(36), ...> relocated INTEGER(1) DEFAULT 0, ...> attributes JSON, ...> keyword VARCHAR, ...> deleted_date_time TIMESTAMP ...> ); sqlite> create index item_index on item (list_id, item_id, modified); sqlite> .eqp full sqlite> select * from item where list_id='158a499e-4432-4073-a73b-9396cae17f42' and item_id=10209335 and modified=1500007910000; QUERY PLAN `--SEARCH TABLE item USING INDEX item_index (list_id=? AND item_id=? AND modified=?) (~8 rows) addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Init 0 45 0 0 Start at 45 1 OpenRead 0 2 0 25 2 root=2 iDb=0; item 2 ColumnsUsed 0 0 0 33554431 0 3 OpenRead 1 3 0 k(4,,,,) 2 root=3 iDb=0; item_index 4 ColumnsUsed 1 0 0 7 0 5 Explain 5 0 0 SEARCH TABLE item USING INDEX item_index (list_id=? AND item_id=? AND modified=?) (~8 rows) 0 6 Noop 0 0 0 0 Begin WHERE-loop0: item 7 CursorHint 1 0 0 AND(AND(EQ(c0,'158a499e-4432-4073-a73b-9396cae17f42'),EQ(c1,10209335)),EQ(c2,63534568)) 0 8 String8 0 1 0 158a499e-4432-4073-a73b-9396cae17f42 0 r[1]='158a499e-4432-4073-a73b-9396cae17f42' 9 Integer 10209335 2 0 0 r[2]=10209335 10 Int64 0 3 0 1500007910000 0 r[3]=1500007910000 11 SeekGE 1 43 1 3 0 key=r[1..3] 12 IdxGT 1 43 1 3 0 key=r[1..3] 13 DeferredSeek 1 0 0 0 Move 0 to 1.rowid if needed 14 Noop 0 0 0 0 Begin WHERE-core 15 Column 0 0 4 0 r[4]=item.date_time 16 Column 1 1 5 0 r[5]=item.item_id 17 Column 1 0 6 0 r[6]=item.list_id 18 Column 0 3 7 0 r[7]=item.unique_id 19 Column 0 4 8 0 r[8]=item.object_type 20 Column 0 5 9 0 r[9]=item.author_id 21 Column 0 6 10 0 r[10]=item.guid 22 Column 0 7 11 0 r[11]=item.editor_id 23 Column 0 8 12 0 r[12]=item.created 24 Column 0 9 13 0 r[13]=item.folder 25 Column 0 10 14 0 r[14]=item.type 26 Column 0 11 15 0 r[15]=item.leaf_ref 27 Column 0 12 16 0 r[16]=item.uri 28 Column 0 13 17 0 r[17]=item.content_type 29 Column 0 14 18 0 r[18]=item.file_ref 30 Column 0 15 19 0 r[19]=item.title 31 Column 1 2 20 0 r[20]=item.modified 32 Column 0 17 21 0 r[21]=item.tag 33 Column 0 18 22 0 r[22]=item.related 34 Column 0 19 23 0 r[23]=item.redirect 35 Column 0 20 24 0 r[24]=item.parent_guid 36 Column 0 21 25 0 0 r[25]=item.relocated 37 Column 0 22 26 0 r[26]=item.attributes 38 Column 0 23 27 0 r[27]=item.keyword 39 Column 0 24 28 0 r[28]=item.deleted_date_time 40 ResultRow 4 25 0 0 output=r[4..28] 41 Noop 0 0 0 0 End WHERE-core 42 Next 1 12 1 0 43 Noop 0 0 0 0 End WHERE-loop0: item 44 Halt 0 0 0 0 45 Transaction 0 0 2 0 1 usesStmtJournal=0 46 Goto 0 1 0 0 sqlite> ```
(19) By Keith Medcalf (kmedcalf) on 2020-10-07 12:37:38 in reply to 15 [link]
What does the following return? PRAGMA index_list(item);
(20.1) By Artur (aaukhatov) on 2020-10-07 12:40:02 edited from 20.0 in reply to 19 [link]
``` sqlite> PRAGMA index_list(item); 0|idx_item_composite|0 1|idx_item_date_time|0 ```
(21) By Artur (aaukhatov) on 2020-10-07 12:41:08 in reply to 20.1 [link]
``` CREATE INDEX idx_item_composite ON item (item_id,modified) ```
(16) By Keith Medcalf (kmedcalf) on 2020-10-07 12:25:16 in reply to 13 [link]
The range scan appears to only be using 1 value and testing 2 values for candidate exclusion. Are we sure that the other indexes actually exist?
(18) By Artur (aaukhatov) on 2020-10-07 12:34:50 in reply to 16 [link]
I've changed my approach. I split my data to their own table by **list_id** field. And the result I don't use list_id in the SELECT query. Of course I've created `index(item_id, modified)` ## Explain ``` sqlite> explain select * from item where item_id=? and modified=?; 0|Trace|0|0|0||00| 1|Variable|1|1|0||00| 2|Variable|2|2|0||00| 3|Goto|0|45|0||00| 4|OpenRead|0|2|0|25|00| 5|OpenRead|1|11858854|0|keyinfo(2,BINARY,BINARY)|00| 6|SCopy|1|3|0||00| 7|IsNull|3|42|0||00| 8|SCopy|2|4|0||00| 9|IsNull|4|42|0||00| 10|Affinity|3|2|0|dc|00| 11|SeekGe|1|42|3|2|00| 12|IdxGE|1|42|3|2|01| 13|IdxRowid|1|5|0||00| 14|Seek|0|5|0||00| 15|Column|0|0|6||00| 16|Column|1|0|7||00| 17|Column|0|2|8||00| 18|Column|0|3|9||00| 19|Column|0|4|10||00| 20|Column|0|5|11||00| 21|Column|0|6|12||00| 22|Column|0|7|13||00| 23|Column|0|8|14||00| 24|Column|0|9|15||00| 25|Column|0|10|16||00| 26|Column|0|11|17||00| 27|Column|0|12|18||00| 28|Column|0|13|19||00| 29|Column|0|14|20||00| 30|Column|0|15|21||00| 31|Column|1|1|22||00| 32|Column|0|17|23||00| 33|Column|0|18|24||00| 34|Column|0|19|25||00| 35|Column|0|20|26||00| 36|Column|0|21|27|0|00| 37|Column|0|22|28||00| 38|Column|0|23|29||00| 39|Column|0|24|30||00| 40|ResultRow|6|25|0||00| 41|Next|1|12|0||00| 42|Close|0|0|0||00| 43|Close|1|0|0||00| 44|Halt|0|0|0||00| 45|Transaction|0|0|0||00| 46|VerifyCookie|0|6|0||00| 47|TableLock|0|2|0|item|00| 48|Goto|0|4|0||00| ```
(23) By Gunter Hick (gunter_hick) on 2020-10-07 14:28:39 in reply to 18 [link]
"I split my data to their own table by list_id field" Does this mean that there are now a whole bunch of tables with identical structure and with the previous list_id as table names? In that case, you would be searching SQLite's list of tables while preparing the statement, thus hiding much of the cost. Table names are NOT data. FWIW keeping the UUID in 16 byte blobs instead of the 36 byte canonical representation would cut the time required to compare them in half, with input and output handled by user defined functions
(26) By Artur (aaukhatov) on 2020-10-08 14:43:51 in reply to 23 [link]
Yes, it means. Thank you Gunter about the UUID in 16 byte. I think it's useful for me.
(27) By Gunter Hick (gunter_hick) on 2020-10-08 16:54:53 in reply to 26
Please do not do that. It will most probably cause lots of problems down the line, much worse than waiting 150ms for a record access. Please review your data type declarations; - there is no DATETIME type in SQLite, use a definition appropriate to your platform (you seem to be using a numeric type) - use BLOB for fields holding UUIDs and store them as 16byte integers - put heavily referenced fields (key fields, WHERE clause fields, and commonly selected fields in that order) toward the front of the record - put variable sized text and blob fields towards the end of the record - if item_id is a "record number" and you want to use it as a foreign key and don't particularly care about its numerical value, consider declaring it INTEGER PRIMARY KEY (read up about the magic involved)
(29) By Artur (aaukhatov) on 2020-10-09 07:02:15 in reply to 27 [link]
I've changed the table schema ## Table Definition ``` CREATE TABLE item ( date_time TIMESTAMP NOT NULL, item_id INTEGER NOT NULL, list_id BLOB(16) NOT NULL, modified TIMESTAMP, unique_id BLOB(16), guid BLOB(16), parent_guid BLOB(16), object_type INTEGER, author_id INTEGER, editor_id INTEGER, created TIMESTAMP, folder VARCHAR(128), type VARCHAR(128), file_leaf_ref VARCHAR(64), uri VARCHAR(255), content_type VARCHAR(128), file_ref VARCHAR(128), title VARCHAR(128), etag INTEGER, related_document VARCHAR(64), server_redirect VARCHAR(64), relocated INTEGER(1) DEFAULT 0, tax_keyword VARCHAR, deleted_date_time TIMESTAMP, unrecognized_attributes JSON ) ``` ## EXPLAIN ``` sqlite> SELECT * ...> FROM item ...> WHERE list_id='3ef58f02-5f41-43b6-890d-091afb4738eb' AND item_id=70 AND modified=1592085845000; QUERY PLAN `--SEARCH TABLE item USING INDEX idx_item_item_id_and_modified (item_id=? AND modified=?) addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ------------- -- ------------- 0 Init 0 38 0 00 Start at 38 1 OpenRead 0 2 0 25 00 root=2 iDb=0; item 2 OpenRead 1 5 0 k(3,,,) 02 root=5 iDb=0; idx_item_item_id_and_modified 3 Integer 70 1 0 00 r[1]=70 4 Int64 0 2 0 1592085845000 00 r[2]=1592085845000 5 SeekGE 1 37 1 2 00 key=r[1..2] 6 IdxGT 1 37 1 2 00 key=r[1..2] 7 DeferredSeek 1 0 0 00 Move 0 to 1.rowid if needed 8 Column 0 2 3 00 r[3]=item.list_id 9 Ne 4 36 3 (BINARY) 51 if r[3]!=r[4] goto 36 10 Column 0 0 5 00 r[5]=item.date_time 11 Column 1 0 6 00 r[6]=item.item_id 12 Column 0 2 7 00 r[7]=item.list_id 13 Column 1 1 8 00 r[8]=item.modified 14 Column 0 4 9 00 r[9]=item.unique_id 15 Column 0 5 10 00 r[10]=item.guid 16 Column 0 6 11 00 r[11]=item.parent_guid 17 Column 0 7 12 00 r[12]=item.object_type 18 Column 0 8 13 00 r[13]=item.author_id 19 Column 0 9 14 00 r[14]=item.editor_id 20 Column 0 10 15 00 r[15]=item.created 21 Column 0 11 16 00 r[16]=item.folder 22 Column 0 12 17 00 r[17]=item.type 23 Column 0 13 18 00 r[18]=item.file_leaf_ref 24 Column 0 14 19 00 r[19]=item.uri 25 Column 0 15 20 00 r[20]=item.content_type 26 Column 0 16 21 00 r[21]=item.file_ref 27 Column 0 17 22 00 r[22]=item.title 28 Column 0 18 23 00 r[23]=item.etag 29 Column 0 19 24 00 r[24]=item.related_document 30 Column 0 20 25 00 r[25]=item.server_redirect 31 Column 0 21 26 0 00 r[26]=item.relocated 32 Column 0 22 27 00 r[27]=item.tax_keyword 33 Column 0 23 28 00 r[28]=item.deleted_date_time 34 Column 0 24 29 00 r[29]=item.unrecognized_attributes 35 ResultRow 5 25 0 00 output=r[5..29] 36 Next 1 6 1 00 37 Halt 0 0 0 00 38 Transaction 0 0 4 0 01 usesStmtJournal=0 39 String8 0 4 0 3ef58f02-5f41-43b6-890d-091afb4738eb 00 r[4]='3ef58f02-5f41-43b6-890d-091afb4738eb' 40 Goto 0 1 0 00 ```
(30) By Gunter Hick (gunter_hick) on 2020-10-09 07:39:49 in reply to 29 [link]
All you have changed is the declared type of the list_id column, which only defines the preferred type for storing values. You have not changed the format of the data you store there, which is quite obvious from the "list_id='3ef58f02-5f41-43b6-890d-091afb4738eb' part of the where clause. You also still do not have a usable index that includes list_id, which is obvious from the "USING INDEX idx_item_item_id_and_modified (item_id=? AND modified=?)" part of the query plan. See https://sqlite.org/datatype3.html for a discussion of data types in SQLite. This means you are still looking through all the records with item_id = 70 and modified at a certain time to find out if your target list is included.
(31) By Kevin Youren (KevinYouren) on 2020-10-10 01:27:26 in reply to 29 [link]
Artur, I suggest you need more information about the number of list_id's, item_id's and modified's. Sometimes people use the term CARDINALITY for these counts. So, start with list_id, which is one form of a UUID. The UUID consists of the digits 0 thru 9 and the letters a thru f or A thru F. The hyphen, '-', is merely for human readability. select count(*) from (select distinct list_id from item) ; Repeat for "modified" and "item_ids".
(28) By Keith Medcalf (kmedcalf) on 2020-10-08 19:09:26 in reply to 18 [link]
Yes, as you can see from the VDBE code, that is using 2 columns for the index scan and doing no subsequent "culling" of the candidates obtained from the index scan. This is different from your original example where the VDBE code shows that only a ONE column index was being used for the index scan and that TWO constraints were being used to cull the candidates from the index scan. This would imply that although you thought you had a usable index on all three columns, that you did not and that in fact the only usable index was a single column index. Since we cannot see your schema which likely does not contain an index that is more useful than the one chosen, we are not able to help.