Hello everyone, I am working on an embedded equipment in Linux C++. It takes about 4.5 seconds to execute the SQL statement below. DELETE FROM points_11 WHERE uint64_ID NOT IN (SELECT uint64_pointIDX FROM values_11); There are 1440 records in table points_11, 46049 records in table values_11. The structure of these 2 talbes is as below. Is this normal? Which aspects I cloud check to speed up please? Thanks a lot. points_11 |_ uint64_ID INTEGER PRIMARY KEY |_ uint_UTC INTEGER |_ uint_OSC INTEGER |_ uint_storeUTC INTEGER |_ uint_storeOSC INTEGER |_ text_timesource TEXT |_ text_status INTEGER |_ uint_job INTEGER |_ blob_raw BLOB values_11 |_ uint64_ID INTEGER PRIMARY KEY |_ uint64_pointIDX INTEGER |_ text_name TEXT |_ text_value TEXT |_ text_unit TEXT |_ uint_UTC INTEGER |_ text_timesource TEXT |_ blob_signature BLOG |_ uint64_obisId INTEGER |_ uint_regValue INTEGER |_ uint_regFormat INTEGER
As phrased and assuming no indexes, and depending on the version of SQLite3 in use, then yes. Since you did not mention whether or not there are any indexes or the version of SQLite3 that would be an exercize called "20 Questions".
Create an index if there is not one on values_11 if not already one where the first column is uint64_pointIDX on that column and change the delete to:
delete from points_11 where not exists ( select * from values_11 where points_11.uint64_id == values_11.uint64_pointIDX );
so the number of operations is limited to be proportional to the number of rows in points_11 and a table scan of values_11 is not required.
Once the index is created, the optimizer in the upcoming 3.35.0 version will perform the NOT IN -> NOT EXISTS optimization and obtain more efficient code (by 10 bytecode instructions per row in points_11) in a proper circumstance.
Ooops. Maybe I have that backwards. Anyway, the creation of the missing index is the first step. Then EXPLAIN QUERY PLAN and/or .eqp on are your best friend.
Hello Keith, The SQLite version is 184.108.40.206, and there is no index created. I've tried to create a index on table values_11's uint64_pointIDX column, but the time exhausted didn't decrease. I'll check the usage of index later. Thanks. Will
The SQLite version is 220.127.116.11
That version of SQLite is from 2009-05-25. If you upgrade to a newer version (such as the soon-to-be-released 3.35.0) you will find that almost everything will run faster, often much faster. (See the micro-optimization article for more information.) The database files and C/C++ language interface is completely backwards compatible, so upgrading should be as simple as recompiling.
(9.1) By Keith Medcalf (kmedcalf) on 2021-03-11 17:12:23 edited from 9.0 in reply to 6 [link] [source]
Without an index there are two ways to solve the query:
(1) Scan the entire values_11 table and collect all the uint64_pointIDX column into a temporary table/b-tree, then scan the points_11 table and for each row look up into the ephemeral table and see if the reference does not exist to determine whether to delete the row.
(2) Scan the points_11 table and for each row, scan the entire points_11 table to see if the reference does not exist to determine whether to delete the appropriate row.
Clearly (2) requires one scan of the table values_11 for each row in points_11 so (1) is going to "cost less" (unless statistics indicate that values_11 has a very low number of rows relative to points_11) so this method will be chosen (which is, in fact, what SQLite3 chooses).
Once an index exists then another option can be considered:
(3) Scan the points_11 table and for each row perform an index lookup on the values_11 index to see the condition for row deletion is met to determine whether to delete the row.
If the index exists then (3) will be "faster" than (1) by the "cost" of scanning values_11 and building the ephemeral table since the index can be used directly rather than having to scan the table to build the ephemeral table. (This possibility will also be detected by the SQLite3 optimizer and chosen if an appropriate index exists). This is effectively "spreading the cost" of the scan and build of the ephemeral table amongst all the insert/delete/update operations performed on values_11.
If pre-defining the index did not significantly reduce the elapsed time that indicates that the actual cost of scanning the values_11 table and building the ephemeral lookup table is very low -- so there may not be much point in maintaining the index at all (that is, the cost of maintaining the index may outweigh the benefit derived therefrom) and that the bottleneck lies elsewhere (that is, it is inherent in the design and methodology being used and that perhaps foreign key enforcement needs to be performed so that points_11 rows are deleted at the time the values_11 rows are deleted thus spreading the "cost" over a large number of operations).
Alternately, since you have timestamp data, just keep track somewhere of when was the last time that you did this so that you do not need to "re-process" points_11 rows that you have already processed.
(3) By Richard Damon (RichardDamon) on 2021-03-11 04:14:45 in reply to 1 [link] [source]
Adding an index to value_11 on the column uint64_pointIDX should help. I think it needs to do a complete table scan of values_11 for each row in points_11 to see it the value is there.