SQLite Forum

Primary Key or Unique Index
Login

Primary Key or Unique Index

(1) By neurosis (neurosis69) on 2022-01-28 15:30:32 [link] [source]

From what I have read and understood, in sqlite a PRIMARY KEY creates an implicit UNIQUE INDEX on the column(s) the key is created. Both have the exact same behaviour with NULL values and everything else is the same as well. Except for the common differences which come from the data model, like there can only be 1 PK and so on and the limitation, that a PK cannot be simply added the an existing table, ...

Are there actually any technical reasons to use a PRIMARY KEY over a UNIQUE INDEX? What did I oversee?

(2) By anonymous on 2022-01-28 15:43:56 in reply to 1 [link] [source]

INTEGER PRIMARY KEY has special meaning that INTEGER UNIQUE does not. And of course the situation is completely different for WITHOUT ROWID tables, which have (and require) a true primary key.

(3) By Ryan Smith (cuz) on 2022-01-28 16:07:47 in reply to 1 [link] [source]

Talking about SQL in general - No, there is no significant difference in the building, using and maintaining of a PK over any other Unique Index. There are database engines that require any table to have a Primary Key (SQlite is not one of them) but all databases can have only 1 PK and it seems that for most DBs that require a PK, the engine in some way "marries" the PK with the underlying row-data structure.

Talking about SQLite specifically - SQLite uses an implicit Key called the Row-id that is the "real" primary key and married to the row-data structure. In fact an SQLite table is itself a kind of "covering Index" with row_id at the root. Later versions of SQLite allows you to substitute the "row_id" mechanism for another defined "Primary Key" using the "WITHOUT ROWID" table-creation modifier provided that a suitable other Primary Key is specified in the table schema. Note that when a primary key is assigned in this way and serves in stead of a row_id, it has a more narrow rule-set to other, let's call them "spiritual guide" primary keys, in that it cannot contain NULL values.

In general, accessing a row via the primary key (where they are married together as mentioned above) is a slight bit faster since accessing it via nay other key typically only provides a reference to the primary key, which in turn is used to look up the real data. In SQLite the row_id performs that function (unless substituted), but other than that, there is no real technical reason to prefer a primary key over any unique index. Another consideration on DBs that require a PK for any table, is that using a different key when you already MUST have a PK is silly, just use a good PK and save on the number of keys you have.

Not sure if that answers the question, because I am not too sure what the question really tries to get to, but I hope that sheds some light.

(6) By neurosis (neurosis69) on 2022-01-28 21:04:25 in reply to 3 [link] [source]

I just tried to figure out if there are any reasons to not just replace all a primary key by an unique index. Having Oracle Databases as background, I'd usually consider that as bad practice and it even doesn't make any sense from any other figures there.

But sqlite seems to be different and less flexible.

A very simplified example:

When I want to migrate 40G of data from one table to another, due to some changes in the data model. There is no way to speed the migration up when I want to stick with the PRIMARY KEY.

CREATE TABLE test1(
    name text PRIMARY KEY, 
    value text);
CREATE TABLE new_test(
    name text PRIMARY KEY, 
    value blob);
INSERT INTO new_test SELECT * from test1;

But if there is no different (technical) behaviour between the automatically created index for a PRIMARY KEY and manually created UNIQUE INDEX, then I may just use an UNIQUE INDEX instead. (just ignoring reasons for PK in ER Models generally)

My simple example would then look like:

CREATE TABLE test1(
   name text PRIMARY KEY, 
   value text);
CREATE TABLE new_test(
   name text, 
   value blob);
INSERT INTO new_test SELECT * from test1;
CREATE UNIQUE INDEX u_name_new_test on new_test(name);

(8) By Ryan Smith (cuz) on 2022-01-28 23:44:49 in reply to 6 [link] [source]

But sqlite seems to be different and less flexible.

Surely you meant: more flexible? Else you are misunderstanding something.

Also, now that we fully understand what you want to do, the advice changes. SQLite always has a primary key on a table in terms of mechanism used, it just doesn't call it so when you do not specify one and simply refers to row_id. You gain nothing by NOT specifying a primary key, the insert mechanics remain.

In your case the best might be to use the primary key in stead of the row_id, like this:

CREATE TABLE new_test(
    name text PRIMARY KEY, 
    value blob
) WITHOUT ROWID;
INSERT INTO new_test SELECT * from test1;

Note that, as mentioned before, this kind of a Primary Key does not allow NULLs.
Also note that this would produce a smaller table, but you would need to test this for speed on the actual platform and hardware - gains may be anywhere from significant to nil due to many factors and the precise combination of optimizations in play, both from SQLite and the underlying OS/hardware.

(9.2) By Keith Medcalf (kmedcalf) on 2022-01-29 00:05:45 edited from 9.1 in reply to 8 [link] [source]

The insert should be done in-order for maximum efficiency. That means you should execute that as:

CREATE TABLE new_test(
    name text PRIMARY KEY, 
    value blob
) WITHOUT ROWID;
INSERT INTO new_test SELECT * from test1 ORDER BY name;

select * from test1 is free to return results in any order unless you specify a specific order and there is no guarantee that unless the order has been specified that the select will not produce rows in the most pathologically inefficient order.

Note that even if you are inserting into a rowid table, you should also make sure that the insert order is optimized since that will be used to build the index as rows are inserted.

Theoretically (assuming sufficient page cache) the two following sets of statements will take more or less the same overall elapsed wall time to execute:

create table new_test
(
  name text primary key,
  value blob
);
insert into new_test select * from test1 order by name;

and

create table new_test
(
  name text,
  value blob
);
insert into test select * from test1;
create index new_test_name on new_test1 (name);

(10) By neurosis (neurosis69) on 2022-01-31 15:23:28 in reply to 8 [link] [source]

Surely you meant: more flexible? Else you are misunderstanding something.

I was thinking of the possibility to enable/disable primary keys, add/drop columns, such things. But I really didn't want to discuss differences in DB engines here. Consider my questions as from an Oracle DBA with no sqlite experience at all, trying to adopt.

You gain nothing by NOT specifying a primary key, the insert mechanics remain.

Given my example there is a huge difference, imo. Because the table with the PRIMARY KEY will have an internal autoindex that needs to be maintained. For my complex real world table, the difference is a bulk load of 140m with PK vs 80m without.

But you also brougt up the solution I was looking for, namely WITHOUT ROWID tables. Creating the table as such and using the bulk load approach Keith Medcalf (9.2) brought up, this almost is as fast as without a PK.

(4) By Larry Brasfield (larrybr) on 2022-01-28 16:31:01 in reply to 1 [link] [source]

From what I have read and understood, in sqlite a PRIMARY KEY creates an implicit UNIQUE INDEX on the column(s) the key is created.

The index that SQLite creates for many primary keys is no more implicit than one you might create yourself. You might say it is implicitly created.

... and the limitation, that a PK cannot be simply added the an existing table

Yes, that is a limitation, and a good reason to get your actual primary key(s) in early. (I count this as a technical reason.)

Are there actually any technical reasons to use a PRIMARY KEY over a UNIQUE INDEX? What did I oversee?

Another technical reason, (in my view of what is "technical"a), is that saying a column is to serve as a primary key is a tremendous favor to readers of your code. Merely giving it the (strictly technical) properties that a primary key must have does not give readers that same clue until they study multiple table definitions.

And, (as anonymous intimates and Ryan almost gets at another way), an INTEGER PRIMARY KEY has very technical advantages. One is that the table itself serves the same function as an index on that column. Technically, this is because the B-tree representation is structured according to the values of that column; no separate index (having the same structure) is required.


a. I find it exceedingly artificial to separate things that a programming language requires or motivates be done from things that help the system, programmers+programs, function more efficiently and reliably. When examined closely, the limitations of human thinkers are "technical" even if not as clearly delineated as more seemingly deterministic systems.

(7) By neurosis (neurosis69) on 2022-01-28 21:16:58 in reply to 4 [link] [source]

I agree with your arguments about using a PK if it simply is a PK.

But as I tried to explain in my post (6), the performance gain in some situations may outweigh such decissions, imo.

By technical reasons I really only mean, for instance, different handling of NULL values, unexpected results, limited use of FKs, or other things someone, with sqlite experience of only a few weeks, wouldn't imagine.

(5) By Keith Medcalf (kmedcalf) on 2022-01-28 18:25:04 in reply to 1 [source]

Another difference is that when a child table references a parent-table, but that reference does not contain column name data, then the reference is assumed to be to the PRIMARY KEY of the parent. This saves having to type the column names in the parent key over again. This only works for columns in the parent declared as PRIMARY KEY, not for those declared UNIQUE; although, a UNIQUE index on the parent column is all that is required (though then you have to specify the columns) in the references clause.