transposing columns to rows and getting counts
(1.1) By punkish on 2023-02-06 10:09:52 edited from 1.0 [link] [source]
Given a table like so
sqlite> .mode box
sqlite> .headers on
sqlite> CREATE TABLE t (id INTEGER PRIMARY KEY, j, y);
sqlite> INSERT INTO t (j, y) VALUES
('foo', 1990),
('bar', 1990),
('foo', 1991),
('baz', 1990),
('foo', 1991),
('bar', 1990),
('foo', 1992),
('baz', 1992),
('foo', 1994),
('bar', 1993),
('foo', 1991),
('baz', 1991);
sqlite> SELECT * FROM t;
┌────┬─────┬──────┐
│ id │ j │ y │
├────┼─────┼──────┤
│ 1 │ foo │ 1990 │
│ 2 │ bar │ 1990 │
│ 3 │ foo │ 1991 │
│ 4 │ baz │ 1990 │
│ 5 │ foo │ 1991 │
│ 6 │ bar │ 1990 │
│ 7 │ foo │ 1992 │
│ 8 │ baz │ 1992 │
│ 9 │ foo │ 1994 │
│ 10 │ bar │ 1993 │
│ 11 │ foo │ 1991 │
│ 12 │ baz │ 1991 │
└────┴─────┴──────┘
sqlite> SELECT j, y, Count(*) c FROM t GROUP BY j, y;
┌─────┬──────┬───┐
│ j │ y │ c │
├─────┼──────┼───┤
│ bar │ 1990 │ 2 │
│ bar │ 1993 │ 1 │
│ baz │ 1990 │ 1 │
│ baz │ 1991 │ 1 │
│ baz │ 1992 │ 1 │
│ foo │ 1990 │ 1 │
│ foo │ 1991 │ 3 │
│ foo │ 1992 │ 1 │
│ foo │ 1994 │ 1 │
└─────┴──────┴───┘
But I want a result set that shows the count of j
by y
and looks like so
┌─────┬──────┬──────┬──────┬──────┬──────┐
│ j │ 1990 │ 1991 │ 1992 │ 1993 │ 1994 │
├─────┼──────┼──────┼──────┼──────┼──────┤
│ foo │ 1 │ 3 │ 1 │ 0 │ 1 │
├─────┼──────┼──────┼──────┼──────┼──────┤
│ bar │ 2 │ 0 │ 0 │ 1 │ 0 │
├─────┼──────┼──────┼──────┼──────┼──────┤
│ baz │ 0 │ 1 │ 1 │ 0 │ 0 │
└─────┴──────┴──────┴──────┴──────┴──────┘
Fwiw, my table has the following counts
sqlite> select count(distinct j) j from t;
┌─────┐
│ j │
├─────┤
│ 768 │
└─────┘
sqlite> select count(distinct y) y from t;
┌─────┐
│ y │
├─────┤
│ 173 │
└─────┘
sqlite> select count(*) c from t;
┌──────────┐
│ c │
├──────────┤
│ 738762 │
└──────────┘
From my limited knowledge, I believe I need to do a pivot of columns to rows and calculate frequency of j / y. There is a pivot vtab extension out there that I could possibly use, but my understanding stops here. Would appreciate any help.
(2) By punkish on 2023-02-06 16:06:35 in reply to 1.1 [source]
with Anton's (github/@nalgeon) help, I was able to install the pivot_vtab
extension on my M1 Pro MacBook, and was able to create a virtual table that kinda did what I wanted. But, as Anton warned me, the extension is buggy, and when I tried to dump the data to a CSV file, I got a segmentation fault.
So, back to the drawing board. I would like to create the transposed view of my data, hopefully without involving any extension (unless I can get it to work flawlessly on both Mac -- development -- and Ubuntu -- production).
Once the pivot table is working, I would update it using triggers, as mentioned in my original post.
Many thanks in advance.
(4) By Keith Medcalf (kmedcalf) on 2023-02-06 20:55:10 in reply to 2 [link] [source]
The old version works fine for me, but the new version crashes on windows with an "oh so sorry screw you" dialog box.
sqlite> create virtual table temp.pvt using pivot_vtab((select j from t group by j order by j),(select y,y from t group by y order by y),(select count(*) from t where j=?1 and y=?2));
sqlite> .mode qbox
sqlite> select * from pvt;
┌───────┬──────┬──────┬──────┬──────┬──────┐
│ j │ 1990 │ 1991 │ 1992 │ 1993 │ 1994 │
├───────┼──────┼──────┼──────┼──────┼──────┤
│ 'bar' │ 2 │ 0 │ 0 │ 1 │ 0 │
│ 'baz' │ 1 │ 1 │ 1 │ 0 │ 0 │
│ 'foo' │ 1 │ 3 │ 1 │ 0 │ 1 │
└───────┴──────┴──────┴──────┴──────┴──────┘
sqlite> .mode csv
sqlite> select * from pvt;
j,1990,1991,1992,1993,1994
bar,2,0,0,1,0
baz,1,1,1,0,0
foo,1,3,1,0,1
sqlite>
note that this works also
sqlite> create virtual table temp.pvt using pivot_vtab((select distinct j from t order by j),(select distinct y,y from t order by y),(select count(*) from t where j=?1 and y=?2));
sqlite> .mode qbox
sqlite> select * from pvt;
┌───────┬──────┬──────┬──────┬──────┬──────┐
│ j │ 1990 │ 1991 │ 1992 │ 1993 │ 1994 │
├───────┼──────┼──────┼──────┼──────┼──────┤
│ 'bar' │ 2 │ 0 │ 0 │ 1 │ 0 │
│ 'baz' │ 1 │ 1 │ 1 │ 0 │ 0 │
│ 'foo' │ 1 │ 3 │ 1 │ 0 │ 1 │
└───────┴──────┴──────┴──────┴──────┴──────┘
sqlite> .mode csv
sqlite> select * from pvt;
j,1990,1991,1992,1993,1994
bar,2,0,0,1,0
baz,1,1,1,0,0
foo,1,3,1,0,1
sqlite>
(7) By Keith Medcalf (kmedcalf) on 2023-02-06 21:35:00 in reply to 4 [link] [source]
Note that if you change the columns you need to reparse the schema. pragma writable_schema=reset
will accomplish that. There is no need to drop and recreate the virtual table.
(8) By nalgeon on 2023-02-06 21:45:18 in reply to 4 [link] [source]
Yeah, it seems like the memory leak in pivot_vtab
was introduced in 2022. I switched to the 2021 version, and the tests passed. Thanks for noticing!
(3) By anonymous on 2023-02-06 18:54:56 in reply to 1.1 [link] [source]
Tables with varying numbers of columns isn't what SQL is good for, but if all you need is CSV, it's possible to hand-craft some.
.mode tabs
.separator ,
.headers off
with
js as
(select distinct j
from t
order by j),
ys as
(select distinct y
from t
order by y),
summary (j, y, n) as
(select j, y, count(id)
from js join ys natural left join t
group by j, y
order by j, y)
select j, n from
(select 0 as o,
'j\y' as j,
group_concat
(case when y glob '*[,"'||char(10,13)||']*' then
'"'||replace(y,'"','""')||'"'
else
y
end) as n
from ys
union all select 1,
case when j glob '*[,"'||char(10,13)||']*' then
'"'||replace(j,'"','""')||'"'
else
j
end,
group_concat(n)
from summary
group by j)
order by o;
(5) By jose isaias cabrera (jicman) on 2023-02-06 21:05:25 in reply to 3 [link] [source]
WOW! Very nice.
(6) By nalgeon on 2023-02-06 21:18:47 in reply to 1.1 [link] [source]
It's also possible to use an eval
function available as a separate extension or as a part of the define extension:
with years as (
select distinct(y) as year
from t
),
lines as (
select 'create view vt as ' as part
union all
select 'select j '
union all
select ',count(y) filter (where y = ' || year || ') as "' || year || '" '
from years
union all
select 'from t group by j;'
)
select eval(group_concat(part, ''))
from lines;
After that:
select * from vt;
┌─────┬──────┬──────┬──────┬──────┬──────┐
│ j │ 1990 │ 1991 │ 1992 │ 1994 │ 1993 │
├─────┼──────┼──────┼──────┼──────┼──────┤
│ bar │ 2 │ 0 │ 0 │ 0 │ 1 │
│ baz │ 1 │ 1 │ 1 │ 0 │ 0 │
│ foo │ 1 │ 3 │ 1 │ 1 │ 0 │
└─────┴──────┴──────┴──────┴──────┴──────┘
(9) By Keith Medcalf (kmedcalf) on 2023-02-06 22:07:28 in reply to 6 [link] [source]
Very nice! You could also put in a conditional drop of the view so that it is re-created when the query is run (you only need to do that if adding a column).
with years(year) as
(
select distinct y
from t
order by 1
),
lines(part) as
(
select 'drop view if exists vt; '
union all
select 'create view vt as '
union all
select 'select j'
union all
select ', count(y) filter (where y = ' || year || ') as "' || year || '"'
from years
union all
select ' from t group by j;'
)
select eval(group_concat(part, ''))
from lines
;
(10) By jose isaias cabrera (jicman) on 2023-02-07 00:08:48 in reply to 9 [link] [source]
WOW! You guys are geniuses.