efficient ways to select unique from a column in a multi-colume primary key

(1) By 6kEs4Majrd on 2022-03-01 01:28:19 [link] [source]

I have a primary key made of several columns in a table.

If I want to select unique entries in one of these columns efficiently. Do I need to set a separate index on this column? Or the default index on the primary key made of these columns will be sufficient?

Without dedicated indexes on each column, it seems that the SELECT from the 1st column in the primary key is faster than the SELECT from the 2nd column, which is, in turn, faster than the SELECT from the 3rd column, and so on. Is this observation consistent with the expected behavior of sqlite3? Thanks.

(2.1) By Keith Medcalf (kmedcalf) on 2022-03-01 02:07:04 edited from 2.0 in reply to 1 [source]

Do you mean that you have a schema like this:

create table x
(
  a integer not null,
  b integer not null,
  c integer not null,
  data,
  primary key(a,b,c)
);

? And comparing the execution of the following queries:

select distinct a
  from x
;
select distinct b
  from x
;
select distinct c
  from x
;

? (The first of which can use the index and the second two will do a separate sort operation after scanning the table/index)?

Does prefixing the query with EXPLAIN QUERY PLAN shed light?

(3) By 6kEs4Majrd on 2022-03-01 02:19:06 in reply to 2.0 [link] [source]

Your schema is almost the same as mine except I use TEXT instead of INTEGER. If I just have the primary key without individual column indexes, I got this. Does it mean that each time I perform the query, an index is created automatically? If I shutdown the sqlite3 session, then the auto index will have to be created again? But if the session is still on, next query will not create the auto index again?

In other words, is the auto index persistent within the same sqlite3 session? Or it will be automatically created whenever the query is run?

sqlite> EXPLAIN QUERY PLAN select distinct name1 from sqlar; QUERY PLAN --SCAN sqlar USING COVERING INDEX sqlite_autoindex_sqlar_1 sqlite> EXPLAIN QUERY PLAN select distinct name2 from sqlar; QUERY PLAN |--SCAN sqlar USING COVERING INDEX sqlite_autoindex_sqlar_1--USE TEMP B-TREE FOR DISTINCT sqlite> EXPLAIN QUERY PLAN select distinct name3 from sqlar; QUERY PLAN |--SCAN sqlar USING COVERING INDEX sqlite_autoindex_sqlar_1 `--USE TEMP B-TREE FOR DISTINCT

(4.2) By Keith Medcalf (kmedcalf) on 2022-03-01 04:01:55 edited from 4.1 in reply to 3 [link] [source]

Does it mean that each time I perform the query, an index is created automatically?

Maybe. If that is the most efficient (lowest cost) way to generate the result.

If I shutdown the sqlite3 session, then the auto index will have to be created again?

Automatic indexes are created and destroyed on a per-query basis. They are not maintained between statements at all. Ever.

However, the sqlite_autoindex_sqlar_1 is not an automatic index. It is the result of creating an index by that name on the table sqlar. If two indexes were created by the create table statement, then the second index would be called sqlite_autoindex_sqlar_2, the third sqlite_autoindex_sqlar_3 and so on and so forth.

In other words, this set of commands:

create table x
(
 a text not null,
 b text not null,
 primary key(a,b)
);

produces the exact same result as the following commands:

create table x
(
 a test not null,
 b text not null
);
create unique index sqlite_autoindex_x_1 on x (a,b);

Names of the form sqlite_autoindex_<tablename>_<number> are merely how the indexes specified in the create table <tablename> are named (since you didn't specify a name for the index).

Automatic indexes for query processing are a completely separate beast -- you never see those existing outside of temporary use while a particular statement is executing.

So the reason that it is taking longer to execute is the additional step USE TEMP B-TREE FOR DISTINCT which means that after the data is collected (by the scan step) that it must be sorted and made distinct before the results can be returned to you.

If there was an index on the column (ie, there was an index in which the first column of the index was the one you needed to sort by) then there would be no need for a separate sort step.

(5) By 6kEs4Majrd on 2022-03-01 05:43:40 in reply to 4.2 [link] [source]

create unique index sqlite_autoindex_x_1 on x (a,b);

sqlite> SELECT * from sqlite_master where type = 'index';
index|sqlite_autoindex_sqlar_1|sqlar|3|

I see sqlite_autoindex_sqlar_1 as above. Why is there no SQL code?

How to know that it is from the primary key?

(6) By Keith Medcalf (kmedcalf) on 2022-03-01 16:50:00 in reply to 5 [link] [source]

I see sqlite_autoindex_sqlar_1 as above. Why is there no SQL code?

Because there was no create index statement issued to create the index.

How to know that it is from the primary key?

Why do you care?

For your entertainment you could always ask the computer to give you details about the index by name:

pragma index_info(sqlite_autoindex_sqlar_1);