PRAGMA INTEGRITY_CHECK takes very long

(1) By Cecil (CecilWesterhof) on 2020-05-03 10:34:51 [link] [source]

I have a very large DB (4,2 GB) and when I do a:  
    PRAGMA INTEGRITY_CHECK

on it, it takes more as twenty minutes.
Is this normal, or do I have a problem?

(2) By Simon Slavin (slavin) on 2020-05-03 13:17:26 in reply to 1 [link] [source]

Normal. It has to read every piece of data in the table and its indexes, and do a lot of cross-referencing.

There's a different check which reports some integrity problems, but doesn't test as many things:

https://sqlite.org/pragma.html#pragma_quick_check

It's faster and may do enough for your purposes.

(3) By Cecil (CecilWesterhof) on 2020-05-03 16:01:41 in reply to 2 [link] [source]

OK, then I do not have to worry.

The quick check is about two times as fast. ;-)

Time is not very important: I run this script only once a day with cron. And the database that took so long is a database I do not use: it was used several years ago for some tests. But I was just wondering.

For all my databases together (except this special one) checking the integrity takes (mostly) less as a minute. Especially in a batch that really is not a problem.


One other point:
I see quit big differences between runs. The fastest can be two times as fast as the slowest. Is this normal?

(4) By Warren Young (wyoung) on 2020-05-03 16:29:41 in reply to 3 [source]

Are you certain the system is otherwise idle when you run your tests?

I mean, if you just pack this into the cron %daily script, then you might be running in parallel with a locate(1) DB update, for instance.

(5) By Cecil (CecilWesterhof) on 2020-05-03 17:22:43 in reply to 4 [link] [source]

Not 100%, but reasonable sure.

This was seen when I ran them from the command-line to test.

(6) By Simon Slavin (slavin) on 2020-05-03 19:01:49 in reply to 5 [link] [source]

You might like to use the Unix time command, or whatever your operating system uses instead of it. Then instead of looking at the real time figure, look at user time and system time.

Real time can vary quite a lot for the reasons already pointed-out. But user time and system time should not.

(8) By Cecil (CecilWesterhof) on 2020-05-04 11:01:03 in reply to 6 [link] [source]

It is very strange indeed. Before I went to bed I started the command two times sequentially. The first gave:
real    52m49.51s
user    10m11.98s
sys     6m37.97s
perc    31.86

and the second:
real    17m7.79s
user    10m14.76s
sys     6m5.03s
perc    95.33

The first took three times as much real time, but the spend time is about the same.


So in the morning I thought lets do it multiple times with:
    for i in $(seq 7) ; do

After more as three hours it was still busy with the first one. So I cancelled it and started it anew and got:
real    20m48.56s
user    10m17.09s
sys     6m13.42s
perc    79.33


real    17m25.53s
user    11m8.20s
sys     6m15.04s
perc    99.78


real    17m26.89s
user    11m9.30s
sys     6m15.26s
perc    99.78


real    16m29.56s
user    10m23.10s
sys     6m4.68s
perc    99.82


real    15m40.67s
user    9m40.58s
sys     5m58.60s
perc    99.84


real    16m11.05s
user    10m5.58s
sys     5m55.47s
perc    98.97


real    16m19.01s
user    10m6.34s
sys     5m58.08s
perc    98.51


I am starting to wonder if there is a problem with my system. :'-(

(9) By Kees Nuyt (knu) on 2020-05-04 14:45:07 in reply to 8 [link] [source]

I am starting to wonder if there is a problem with my system.

There is nothing wrong with your system.

In all of those runs, the same amount of work has to be done, hence user and sys time will be approximately the same.

In the first run,. the database is not in the OS filesystem cache (AKA disk cache), so a lot of time is spent waiting for I/O (read from disk).

In all subsequent runs a significant part of the database will be in the cache, especially often used database pages.

You can experiment with PRAGMA cache_size=...; but this will not make a lot of difference (I tried), because it only moves part of the caching from the OS disk cache to the SQLite page cache.

You can stop worrying.

-- 
Regards,
Kees Nuyt

(7) By Keith Medcalf (kmedcalf) on 2020-05-03 19:38:10 in reply to 5 [link] [source]

Of course whether all or part of the database is in the system cache is also relevant, and if it is not in the cache then what has to be evicted to make room also will have an effect. After all, not performing physical I/O is the best way to speed up I/O ...