How to efficiently query membership in IPv6 ranges

(1) By llimllib on 2024-10-27 20:38:01 [link] [source]

Following up on Julia Evans' post here: https://jvns.ca/blog/2024/10/27/asn-ip-address-memory/

I wanted to see how quickly I could query presence of an ipv6 address in IPv6 ranges, and unfortunately I wasn't able to get it to be very quick at all.

This gist contains my schema and the code I used to try and query it: https://gist.github.com/llimllib/e9e48bb7e1afde461b921c6c0ae90a6d

In summary, the data type didn't matter much for speed, but I tried three data types to store the IPv6 start and end addresses: TEXT, BLOB and two INTEGERs

The most likely answer I can guess is that there needs to be a better index; I wasn't able to get sqlite to use the indexes on both start and end:

sqlite> explain query plan SELECT asn FROM ipv6_ranges WHERE x'0500' BETWEEN start_ip_blob AND en
d_ip_blob;
QUERY PLAN
`--SEARCH ipv6_ranges USING INDEX idx_ipv6_ranges_end_ip_blob (start_ip_blob<?)

I tried splitting the query up into an intersection of two selects to get it to use both indexes:

sqlite> EXPLAIN QUERY PLAN SELECT asn FROM ipv6_ranges WHERE ROWID IN (
  SELECT ROWID FROM ipv6_ranges WHERE x'0500' > start_ip_blob 
  INTERSECT 
  SELECT ROWID FROM ipv6_ranges WHERE x'0500' < end_ip_blob);
QUERY PLAN
|--SEARCH ipv6_ranges USING INTEGER PRIMARY KEY (rowid=?)
`--LIST SUBQUERY 2
   `--COMPOUND QUERY
      |--LEFT-MOST SUBQUERY
      |  `--SEARCH ipv6_ranges USING COVERING INDEX idx_ipv6_ranges_end_ip_blob (start_ip_blob<?)
      `--INTERSECT USING TEMP B-TREE
         `--SEARCH ipv6_ranges USING COVERING INDEX idx_ipv6_ranges_start_ip_blob (end_ip_blob>?)

But that wasn't any faster.

Am I missing any easy way to speed up search for membership in an IPv6 range?

(2) By SeverKetor on 2024-10-27 21:11:59 in reply to 1 [link] [source]

Assuming non-overlapping ranges...

I haven't had to deal with IPv6, but for IPv4 CIDRs I stored the start of the range in decimal as an integer primary key, then did queries like SELECT * FROM (SELECT * FROM ip_ranges WHERE Start<=@IP ORDER BY Start DESC LIMIT 1) WHERE End>=@IP. I'm not in a spot to work out a good way of converting it to handle IPv6 right now though, but maybe this will be enough of a start for you.

As an aside, I did not store the end, instead just storing the CIDR length with the two most common lengths being replaced with 0s and 1s in order to save on database size (which also had a noticable impact on the write speed when updating the IP info since fewer bytes were written). I then used a generated column to make the range end easy to work with. I also used a few functions from a custom extension in order to speed up write performance.

(3) By Mark Lawrence (mark) on 2024-10-28 12:01:23 in reply to 1 [source]

Try this, which takes advantage of the SQLite optimisation for a single MIN()/MAX() value:

SELECT
    r.asn,
    r.start_ip_blob,
    MIN(r.end_ip_blob) as end_ip_blob
FROM ipv6_ranges r
WHERE r.start_ip_blob <= ? AND r.end_ip_blob >= ?

-- QUERY PLAN
-- `--SEARCH TABLE ipv6_ranges AS r
--       USING INDEX idx_ipv6_ranges_end_ip_blob (end_ip_blob<?)

The result appears to be quite an improvement:

text between: 11.4895 87.04 selects / sec
text lt/gt: 10.6409 93.98 selects / sec
blob between: 12.0124 83.25 selects / sec
blob lt/gt: 11.0210 90.74 selects / sec
blob intersect: 44.6270 22.41 selects / sec
blob MAX: 0.0604 16565.05 selects / sec  <-------- Added
int between: 5.5083 181.54 selects / sec
int lt/gt: 5.3409 187.23 selects / sec

But this comes with caveats:

The tests in your gist don't appear to confirm that the queries are actually returning results... Makes it a bit hard to confirm each test is accurate.
Your index names doesn't always match the columns - see above