how to output grouped values in a JSON array
(1) By punkish on 2024-07-27 17:57:35 [link] [source]
Is there an quick and efficient way to output JSON results of a GROUPed query where the grouped values are arrays? Consider the following
CREATE TEMP TABLE a (id INTEGER PRIMARY KEY, a_title TEXT);
INSERT INTO a (a_title)
VALUES
('foo'),
('bar'),
('baz'),
('qux');
CREATE TEMP TABLE b (id INTEGER PRIMARY KEY, url TEXT, a_id INTEGER);
INSERT INTO b (url, a_id)
VALUES
('http://server/path1', 1),
('http://server/path2', 1),
('http://server/path3', 2),
('http://server/path4', 3),
('http://server/path5', 4),
('http://server/path6', 4),
('http://server/path7', 4);
CREATE TEMP TABLE c (id INTEGER PRIMARY KEY, a_id INTEGER, lat REAL, lon REAL);
INSERT INTO c (a_id, lat, lon)
VALUES
(1, 20.0, 43.0),
(2, 17.0, -12.0),
(2, 17.0, -11.5),
(3, 120.0, 3.0),
(4, 26.0, 58.0),
(4, 26.2, 58.3),
(4, 26.5, 58.0),
(4, 26.0, 57.8);
A naive SELECT like so…
SELECT
b.id AS b_id, b.url,
a.id AS a_id, a.a_title,
c.id AS c_id, c.lat, c.lon
FROM b JOIN
a ON b.a_id = a.id JOIN
c ON b.a_id = c.a_id
ORDER BY b.id;
gives me
b_id url a_id a_title c_id lat lon
---- ------------------- ---- ------- ---- ----- -----
1 http://server/path1 1 foo 1 20.0 43.0
2 http://server/path2 1 foo 1 20.0 43.0
3 http://server/path3 2 bar 2 17.0 -12.0
3 http://server/path3 2 bar 3 17.0 -11.5
4 http://server/path4 3 baz 4 120.0 3.0
5 http://server/path5 4 qux 8 26.0 57.8
5 http://server/path5 4 qux 5 26.0 58.0
5 http://server/path5 4 qux 6 26.2 58.3
5 http://server/path5 4 qux 7 26.5 58.0
6 http://server/path6 4 qux 8 26.0 57.8
6 http://server/path6 4 qux 5 26.0 58.0
6 http://server/path6 4 qux 6 26.2 58.3
6 http://server/path6 4 qux 7 26.5 58.0
7 http://server/path7 4 qux 8 26.0 57.8
7 http://server/path7 4 qux 5 26.0 58.0
7 http://server/path7 4 qux 6 26.2 58.3
7 http://server/path7 4 qux 7 26.5 58.0
I would like to output JSON where the repeating lat,lon pairs for each record are grouped in an array like so
[
{
"b_id": 1,
"url": "http://server/path1",
"a_id": 1,
"a_title": "foo",
"c_id": 1,
"loc": [
{
"lat": 20,
"lon": 43
}
]
},
{
"b_id": 2,
"url": "http://server/path2",
"a_id": 1,
"a_title": "foo",
"c_id": 1,
"loc": [
{
"lat": 20,
"lon": 43
}
]
},
{
"b_id": 3,
"url": "http://server/path3",
"a_id": 2,
"a_title": "bar",
"c_id": 2,
"loc": [
{
"lat": 17,
"lon": -12
},
{
"lat": 17,
"lon": -11.5
}
]
},
{
"b_id": 4,
"url": "http://server/path4",
"a_id": 3,
"a_title": "baz",
"c_id": 4,
"loc": [
{
"lat": 120,
"lon": 3
}
]
},
{
"b_id": 5,
"url": "http://server/path5",
"a_id": 4,
"a_title": "qux",
"c_id": 8,
"loc": [
{
"lat": 26,
"lon": 57.8
},
{
"lat": 26,
"lon": 58
},
{
"lat": 26.2,
"lon": 58.3
},
{
"lat": 26.5,
"lon": 58
}
]
}
]
Is there a way to get SQLite to output this directly? Fwiw, I tried the following, but that didn't get me what I wanted.
SELECT json_group_array(
json_object(
'b_id', b.id,
'url', b.url,
'a_id', a.id,
'a_title', a_title,
'c_id', c.id,
'lat', c.lat,
'lon', c.lon
)
)
FROM b JOIN
a ON b.a_id = a.id JOIN
c ON b.a_id = c.a_id
GROUP BY c.id
ORDER BY b.id;
(2.3) By SeverKetor on 2024-07-27 19:03:34 edited from 2.2 in reply to 1 [link] [source]
For reasons I don't understand, you can't just use loc
, you have to use json(loc)
, otherwise it gets treated as a regular string.
I assumed you specifically want the highest c_id based on the last one where it was 8.
WITH cte AS (
SELECT
b.id AS b_id,
b.url,
a.id AS a_id,
a_title,
(SELECT MAX(id) FROM c WHERE a_id=b.a_id) AS c_id,
(SELECT json_group_array(json_object('lat', lat, 'lon', lon)) AS loc FROM c WHERE a_id=b.a_id GROUP BY a_id) AS loc
FROM b
JOIN a ON b.a_id = a.id
GROUP BY loc
ORDER BY b.id
)
SELECT json_group_array(
json_object(
'b_id', b_id,
'url', url,
'a_id', a_id,
'a_title', a_title,
'c_id', c_id,
'loc', json(loc)
)) FROM cte;
(3.1) By punkish on 2024-07-28 16:19:48 edited from 3.0 in reply to 2.3 [link] [source]
thank you so much for this… it almost does what I want, but the reason for "almost" is my fault. I described the problem slightly incorrectly. Every c_id
has a location (a lat,lon
pair). So, what I really want is the following --
[
{
"b_id": 1,
"url": "http://server/path1",
"a_id": 1,
"a_title": "foo",
"loc": [
{
"c_id": 1,
"lat": 20,
"lon": 43
}
]
},
{
"b_id": 2,
"url": "http://server/path2",
"a_id": 1,
"a_title": "foo",
"loc": [
{
"c_id": 1,
"lat": 20,
"lon": 43
}
]
},
{
"b_id": 3,
"url": "http://server/path3",
"a_id": 2,
"a_title": "bar",
"loc": [
{
"c_id": 2,
"lat": 17,
"lon": -12
},
{
"c_id": 3,
"lat": 17,
"lon": -11.5
}
]
},
{
"b_id": 4,
"url": "http://server/path4",
"a_id": 3,
"a_title": "baz",
"loc": [
{
"c_id": 4,
"lat": 120,
"lon": 3
}
]
},
{
"b_id": 5,
"url": "http://server/path5",
"a_id": 4,
"a_title": "qux",
"loc": [
{
"c_id": 8,
"lat": 26,
"lon": 57.8
},
{
"c_id": 5,
"lat": 26,
"lon": 58
},
{
"c_id": 6,
"lat": 26.2,
"lon": 58.3
},
{
"c_id": 7,
"lat": 26.5,
"lon": 58
}
]
}
]
I will study your solution and see if I can modify it to come up with what I want, but first I have to understand what is going on. If you have time to elucidate, please do.
My related query is that obviously the data I've provided is for the sake of this post only. The real data is much, much bigger (almost 200K rows in table b, almost 800K rows in table a, and more than a million rows in table c). Not only do I want the results fast, I want to return 30 rows of b_id
where each one of those b_id
may have zero or more locations. In the non-JSON world, I accomplish that with a LIMIT 30 OFFSET ?
. But here I can't do that because 30 rows with location might result in only 6 or 10 or 25 rows of b_id
(fewer than 30, in any case). Any suggestions?
(4.2) By SeverKetor on 2024-07-28 17:42:12 edited from 4.1 in reply to 3.1 [source]
I think this is what you want?
Searching for lat/lon will probably hurt unless you create an index on c(a_id).
Depending on the performance from using LIMIT + OFFSET, you may also want to add WHERE b.id > ?
to the CTE and remove OFFSET, then just plug in the greatest b.id you have gotten so far as a parameter.
WITH cte AS (
SELECT
b.id AS b_id,
b.url,
a.id AS a_id,
a_title,
( SELECT
json_group_array(json_object('c_id', c.id, 'lat', lat, 'lon', lon))
FROM c
WHERE a_id=b.a_id
) AS loc
FROM b
JOIN a ON b.a_id = a.id
ORDER BY b.id
LIMIT 30 OFFSET 0
)
SELECT json_group_array(
json_object(
'b_id', b_id,
'url', url,
'a_id', a_id,
'a_title', a_title,
'loc', json(loc)
)) FROM cte;