Does index slow down insertion

(1) By 6kEs4Majrd on 2022-03-01 18:05:17 [link] [source]

I have a table with a primary key. When I insert, I ensure that entries with the primary keys already in the table will not be inserted again. So I will have to use the primary keys?

It seems that the more entries the table becomes, the slower the insertion is.

Given these considerations, what is the best strategy to speed up the insertion so that the total table population time is linear wrt the table size? (For example, remove the primary key, allow duplication at the insertion, but remove duplication afterward, then add a unique index on the columns that were primary key before.)

(2) By Keith Medcalf (kmedcalf) on 2022-03-01 18:20:51 in reply to 1 [link] [source]

Yes.

I have a table with a primary key. When I insert, I ensure that entries with the primary keys already in the table will not be inserted again.

You can choose to do this, but why? You simply cannot insert a tuple with the same primary key. You can check manually, or just not bother to check and let the INSERT processing check for you -- it will anyway, whether you like it or not, if there is a primary key defined.

So I will have to use the primary keys?

What do you mean by this?

Given these considerations, what is the best strategy to speed up the insertion so that the total table population time is linear wrt the table size? (For example, remove the primary key, allow duplication at the insertion, but remove duplication afterward, then add a unique index on the columns that were primary key before.)

There are many methods of doing that, you have stated perhaps one of them. The other, of course, is to perform your inserts in order in order to avoid b-tree rebalances (and make sure your cache is large enough).

You will pay the cost of maintaining and using the index if you have an index. How you choose to pay the cost is up to you. But you will pay.

(3) By 6kEs4Majrd on 2022-03-01 19:55:03 in reply to 2 [link] [source]

You can choose to do this, but why?

You misunderstood what I meant. The second sentence is to explain why I use the primary key in the 1st sentence.

What do you mean by this?

I just want to make sure if I want to ensure those columns are unique. With the least programing effort (e.g., measured by SQL code complexity, or time to program such), is the primary key solution the way to go?

is to perform your inserts in order

In my case, the first column is random, but given a value of the first column, the entries are sorted by the second and the third column. Is there any way to speed up insertion yet still keep the 1st, 2nd, and 3rd column tuple unique in this case?

(4) By Roman (moskvich412) on 2022-03-04 20:20:10 in reply to 3 [source]

To ensure uniqueness new row must be tested against exiting ones. The more exists, the slower the comparison during insertion. Therefore, no way to speed up. However,

As Keith wrote, if you have all records up front, you can sort them first (in all 3 columns you need) before insertion. This will help the indexing because it does not have to jump all over and rebuild itself. Or you can insert first, create unique indexes and vacuum.

Roman

(5) By cj (sqlitening) on 2022-03-08 04:35:01 in reply to 1 [link] [source]

Wrap multiple inserts within a transaction so database is locked once.