SQLite doesn't use indexes for bitwise clauses?

(1) By HowardKapustein (howardk) on 2021-04-18 01:35:55 [link] [source]

REPRO: Create a table with an index covering a not-null integer. EXPLAIN QUERY PLAN SELECT * FROM table WHERE... reports the index is used for WHERE n=? and WHERE n>? but not WHERE n & ? != 0

This is with SQLite 3.34.1

sqlite> .schema resource
CREATE TABLE Resource(_ResourceID INTEGER PRIMARY KEY NOT NULL,_Revision INTEGER NOT NULL DEFAULT 1,_WorkId INTEGER NOT NULL DEFAULT 0,Package INTEGER NOT NULL,"Index" INTEGER NOT NULL,Language TEXT NOT NULL COLLATE NOCASE);
CREATE UNIQUE INDEX IDX_Resource_Package_Index__WorkId ON Resource(Package, "Index", _WorkId);
CREATE INDEX IDX_Resource__WorkId ON Resource(_WorkId) WHERE _WorkId<>0;
sqlite>
sqlite>
sqlite>
sqlite> explain query plan select * from resource where package=?;
QUERY PLAN
`--SEARCH TABLE resource USING INDEX IDX_Resource_Package_Index__WorkId (Package=?)
sqlite>
sqlite>
sqlite> explain query plan select * from resource where package>?;
QUERY PLAN
`--SEARCH TABLE resource USING INDEX IDX_Resource_Package_Index__WorkId (Package>?)
sqlite>
sqlite>
sqlite> explain query plan select * from resource where package & ? != 0;
QUERY PLAN
`--SCAN TABLE resource
sqlite>
sqlite>
sqlite> explain query plan select * from resource where package & ? = 0;
QUERY PLAN
`--SCAN TABLE resource

We use bitwise logic in multiple places. Usually in the form x & ? != 0 or x & ? = 0, and less often testing against non-zero e.g. x & ? = 0x1408

I was little surprised the indexes aren't leveraged for bitwise comparisons. Is there a reason why not or is it just an oversight that can be addressed in vNext?

(2) By Richard Hipp (drh) on 2021-04-19 11:51:00 in reply to 1 [link] [source]

How would an index be helpful in processing a "x & ? = 0" WHERE clause term? What algorithm do you propose?

(3) By ddevienne on 2021-04-19 12:53:38 in reply to 2 [link] [source]

Let h be an integer with the high-bit (most significant) of ? set as it's only ON bit.
Only x values with that bit can be non-zero when ANDed with ?, which implies they are >= h.
Therefore x & ? = 0 implies x > h, which allows to start the scan traversal from somewhere
later than the start, thus avoiding a full scan? Just thinking aloud :). I suspect I'll soon be corrected!

(4) By Richard Hipp (drh) on 2021-04-19 13:20:37 in reply to 3 [link] [source]

What if the value is a negative number? What it if is something other than an integer, like a floating point number or string or blob?

(8) By ddevienne on 2021-04-19 14:17:28 in reply to 4 [link] [source]

Yes, I assumed bit expressions involved positive integers only indeed.
I thought the bitwise operators required that. That's my only use-case for them in fact.

Perhaps the OP want to look into expression-based indexes and/or partial indexes instead?
That does assume his (or her) ? values have very low cardinality.

Anyways, as I wrote, I was just thinking aloud.

(14) By HowardKapustein (howardk) on 2021-04-19 19:32:02 in reply to 8 [link] [source]

I only use bitwise operators on columns designed to be bitfields. Even though INTEGER is int64 I only use it as uint64 (when bitfields are in play).

That's also why I only need equality comparisons (often !=0, less often =0, rarely =0x1234 or other multi-bit pattern)

I use expression-based and partial indexes for other reasons but after some quick experimentation they don't seem to help here. I'd need to deeper analysis to see if I could make it work, but it's a complex puzzle so fiddling here can have unintended consequences. IOW like many things in life, it's complicated

(5) By curmudgeon on 2021-04-19 13:44:11 in reply to 3 [link] [source]

If it's an integer you could work out that x will be between a and b yourself and include that condition in the where such that the index will be used.

(6) By Richard Damon (RichardDamon) on 2021-04-19 13:48:30 in reply to 3 [link] [source]

The big issue is that since you used a ?, the processing can't assume it is something big, so can't search part way down. If the ? was replaced with a 0 or a 1, then the full scan would definitely be the fastest. Remember, if you use an index, then you need to do a second lookup for every hit you find, unless the index itself has all the information you need, so if you are going to need to actually scan the index, you might as well scan the original table.

Yes, perhaps you, knowing a lot about the structure of the table, can decide that some other method would be faster in this case, the planner has to work with what it knows, and wants to make the decision with somewhat simple logic to avoid slowing down the planning of simpler cases. Slightly sub-optimal decisions are not really bugs, particularly if it is a corner case.

In many cases, using bit encoding to try and squeeze a bit of efficiency is the wrong tactic, better to have a set of binary fields that you can build indexes on if you really want to be able to select on things like this efficiently. If you don't need efficiency, you shouldn't worry about it.

(12) By HowardKapustein (howardk) on 2021-04-19 18:55:28 in reply to 6 [link] [source]

In many cases, using bit encoding to try and squeeze a bit of efficiency is the wrong tactic, better to have a set of binary fields that you can build indexes on if you really want to be able to select on things like this efficiently

I do it for process efficiency, not runtime.

Schema changes are significant events (for non-technical reasons) whereas using a previously reserved bit is a trivial non-event. Using INTEGER as a logical BOOLEAN[64] avoids these non-technical issues.

There's also some runtime efficiency but that's not as significant to me.

(13) By Richard Damon (RichardDamon) on 2021-04-19 19:20:33 in reply to 12 [source]

Yes, there are reasons to use less efficient methods to get results. You just have to realize that doing so may cause performance issues.

Making the query have an ORDER BY clause might get the planner to use the index, and maybe at some point the scanner will be smart enough to think of skipping.

Adding functional indexes to the table (for the values of x & n that you use a lot) but this might not help if the n is provided as a binding to ? as that is too later for the planner to act on it.

(This assumes that adding indexes doesn't trigger as much of an issue as other Schema changes.) Adding indexes also will add overhead to every change to that table.

If efficiency isn't significant, don't worry about it.

(15) By HowardKapustein (howardk) on 2021-04-19 19:40:17 in reply to 13 [link] [source]

Sorry, when I said it's not significant I meant it is but it doesn't trump the process issues. That doesn't mean I don't get constant perf 'feedback' :P especially by folks who don't understand the process issues and don't worry about the (significant) cost to address those.

The goal is 'instantaneous' so anything over zero is undesirable. Necessary of course, but there's a strong recurring desire to be slimmer. I'm surprised bitwise expressions don't involve indexes(1) and was hoping SQLite could work more optimization magic

Adding indexes didn't help with some experimentation. I'd need to do a deeper redesign-think to see what options might exist, other than exploding INTEGER bitfield columns into multiple INTEGER boolean columns. But I'm really hoping I don't have to go there...

(1) we have a fixed system w/o adhoc queries so for perf reasons we aim for 100% index coverage

(16) By Richard Damon (RichardDamon) on 2021-04-19 20:22:15 in reply to 15 [link] [source]

As I said, I wouldn't expect indexes to help if the mask is done with a bound parameter, as the planner doesn't know which index (if any) would be of help. In effect, you have created a run time 'column select' which if it were legal to do directly would give the planner fits too.

That does become one question, if you can make the bit-mask a fixed value in the SQL Statement, the planner could choose a partial index to use (if one exists) and you could get a speedup.

(18) By HowardKapustein (howardk) on 2021-04-19 23:47:34 in reply to 16 [link] [source]

if you can make the bit-mask a fixed value in the SQL Statement

Generally not. It might be possible in a handful of cases but that's more an exception than the rule.

The database predates SQLite's bitwise support, but once that was available we starting leveraging it pretty regularly (most tables have at least 1 bitfield, some have multiple). Reworking the design can pose significant impact. I'll consider it (in part or whole), if necessary. But I'm hoping it won't be necesssary

(7) By David Raymond (dvdraymond) on 2021-04-19 14:14:21 in reply to 2 [link] [source]

How would an index be helpful in processing a "x & ? = 0" WHERE clause term? What algorithm do you propose?

Wouldn't you then only need to execute the test once per each unique value of x?

Go through the index in order, when x changes test the new x. Then you can either say "include every row until x changes" or "ignore every row until x changes"

On a simple test like this then it might not save much, but the more expensive the test, and the fewer distinct values of x then the more useful that would be.

(That or I'm missing something obvious)

(9) By Richard Damon (RichardDamon) on 2021-04-19 15:11:09 in reply to 7 [link] [source]

I don't know of any great algorithm to scan an index and get to the next unique value that is that much faster than a scan. Maybe if you could assume that the values have somewhat low cardinality and a very small percentage match the pattern (most patterns having low bit count). You only get real savings if you can skip reading a number of pages because you read pages before and after it that have the same value of x, and x doesn't match the pattern.

Then, for all the matches, you need to still go to the main table and find that entry unless the index is a covering index supplying all the data needed for the query. This need may easily make the table scan faster. Maybe adding an ORDER BY x clause would increase the chance of it using the index.

(11) By HowardKapustein (howardk) on 2021-04-19 18:51:30 in reply to 9 [link] [source]

In one case I'm fetching data from table A that meets criteria in table B

SELECT * FROM A
INNER JOIN B ON B.a=A.rowid
WHERE B.name=? AND B.bitfield & 0x0040 != 0;

Assume table B has more columns than bitfield. Why do I need a table scan of B if I can CREATE INDEX x ON B(name, bitfield);? Table B has other columns so the index has greater density e.g. if there's 2 records per page for table B but 20 per index page. Table scans are significantly less efficient than indexes.

I'm not using bitwise operators for relational comparisons. I'm using an INTEGER column as a bitfield and only interested in (not)equality. ORDER BY doesn't help.

(17) By Richard Damon (RichardDamon) on 2021-04-19 22:52:19 in reply to 11 [link] [source]

It night use the index to find name, but the only way to find values in bitfield that match, with just an index on the value, is to scan through all the values. It can't assume that it only wants to find the value 0x0040, as that isn't the condition. Yes, if the cycles were added to ALL index searches, it could possibly do some proofs and determine the lowest value it could be would be 0x0040 and skip over to find that. and when it gets to 0x007F skip to 0x00C0, but that is a lot of logic for a very special case.

Add a partial index on "B(name) where bitfield & 0x0040 != 0" might be good enough to get it to use the index. Of course, you will need to make a partial index for every bit combination you want to test.

Making it a partial index cuts down the work to keep them, but does say you can't look for the converse condition of WHERE B.name=? AND B.bitfield & 0x0040 == 0

For that, you would need to make an index on B(name, bitfield&0x0040), and for all other bits you need, which gets to be a lot of index data.

(19) By Bendo (BendorZx) on 2023-11-14 07:42:36 in reply to 17 [link] [source]

I'm also surprised that bit operations don't use indexes. I'm sure this worked on earlier versions. Bit fields allow you to reduce the number of columns and indexes used - instead of several dozen columns, you can use one bit column. Also, instead of creating dozens of indexes for different columns, you can create one index per bit column. But unfortunately it doesn’t work now and this is a big drawback :(

(20) By Gunter Hick (gunter_hick) on 2023-11-14 09:25:30 in reply to 19 [link] [source]

Which "earlier versions" specifically?

Maybe implement a bitmap index using https://sdm.lbl.gov/fastbit/ inside a virtual table.

(22) By Deon Brewis (deonb) on 2023-11-14 11:04:23 in reply to 17 [link] [source]

Scanning through all the values are fine, but it should scan through all the values of the index, which it doesn't do in this case. Instead it scans through all the values on the main table.

(10) By HowardKapustein (howardk) on 2021-04-19 18:42:51 in reply to 2 [link] [source]

Bitwise evaluation of the index value?

Functionally we only use bitwise operators when we use a column as a bitfield, and trying to do equivalent non-bitwise operations doesn't make much sense to me.

IF (x & 0x0040 != 0)

is equivalent to what without bitwise operators?

SQLite knows I specified a bitwise operator in the expression so the hope/expectation is it could do...something.

In theory I could specify some kind of hint e.g. *column == use bitwise index evaluation

SELECT * FROM table WHERE *x & ? != 0;

I could even live with a hint in the schema definition to indicate it supports bitwise operations (CREATE TABLE..N INTEGER NOT NULL HINT BITFIELD... or something in CREATE INDEX?). I'd rather see SQLite automagically handle it but the perf benefits should be significant enough I can justify additional markup on my part if necessary

There's always the new-type option (BITFIELD instead of INTEGER) but I expect you're even less of a fan than I am (and I'm not :P)

(21) By Deon Brewis (deonb) on 2023-11-14 10:55:45 in reply to 2 [link] [source]

The main benefit comes from the index being a fraction of the size of the main table. i.e. Scanning through fewer pages of data.

It kind'a does that already - I use indexed bitwise all the time and it works great... with one caveat: The index has to be covering.

i.e. If the OP uses:

explain query plan select _ResourceId from resource where package & ? != 0;

It will indeed use the index:

SCAN resource USING COVERING INDEX IDX_Resource_Package_Index__WorkId

It probably uses that query plan more based on the fact that it's covering rather than the optimizer explicitly thinking it's useful.

It would probably be helpful if it can still use that index even if you select a non-covering column.