transposing columns to rows and getting counts

(1.1) By punkish on 2023-02-06 10:09:52 edited from 1.0 [link] [source]

Given a table like so

sqlite> .mode box
sqlite> .headers on
sqlite> CREATE TABLE t (id INTEGER PRIMARY KEY, j, y);
sqlite> INSERT INTO t (j, y) VALUES 
    ('foo', 1990),
    ('bar', 1990),
    ('foo', 1991),
    ('baz', 1990),
    ('foo', 1991),
    ('bar', 1990),
    ('foo', 1992),
    ('baz', 1992),
    ('foo', 1994),
    ('bar', 1993),
    ('foo', 1991),
    ('baz', 1991);
sqlite> SELECT * FROM t;
┌────┬─────┬──────┐
│ id │  j  │  y   │
├────┼─────┼──────┤
│ 1  │ foo │ 1990 │
│ 2  │ bar │ 1990 │
│ 3  │ foo │ 1991 │
│ 4  │ baz │ 1990 │
│ 5  │ foo │ 1991 │
│ 6  │ bar │ 1990 │
│ 7  │ foo │ 1992 │
│ 8  │ baz │ 1992 │
│ 9  │ foo │ 1994 │
│ 10 │ bar │ 1993 │
│ 11 │ foo │ 1991 │
│ 12 │ baz │ 1991 │
└────┴─────┴──────┘
sqlite> SELECT j, y, Count(*) c FROM t GROUP BY j, y;
┌─────┬──────┬───┐
│  j  │  y   │ c │
├─────┼──────┼───┤
│ bar │ 1990 │ 2 │
│ bar │ 1993 │ 1 │
│ baz │ 1990 │ 1 │
│ baz │ 1991 │ 1 │
│ baz │ 1992 │ 1 │
│ foo │ 1990 │ 1 │
│ foo │ 1991 │ 3 │
│ foo │ 1992 │ 1 │
│ foo │ 1994 │ 1 │
└─────┴──────┴───┘

But I want a result set that shows the count of j by y and looks like so

┌─────┬──────┬──────┬──────┬──────┬──────┐
│  j  │ 1990 │ 1991 │ 1992 │ 1993 │ 1994 │
├─────┼──────┼──────┼──────┼──────┼──────┤
│ foo │  1   │  3   │  1   │  0   │  1   │
├─────┼──────┼──────┼──────┼──────┼──────┤
│ bar │  2   │  0   │  0   │  1   │  0   │
├─────┼──────┼──────┼──────┼──────┼──────┤
│ baz │  0   │  1   │  1   │  0   │  0   │
└─────┴──────┴──────┴──────┴──────┴──────┘

Fwiw, my table has the following counts

sqlite> select count(distinct j) j from t;
┌─────┐
│  j  │
├─────┤
│ 768 │
└─────┘
sqlite> select count(distinct y) y from t;
┌─────┐
│  y  │
├─────┤
│ 173 │
└─────┘
sqlite> select count(*) c from t;
┌──────────┐
│    c     │
├──────────┤
│ 738762   │
└──────────┘

From my limited knowledge, I believe I need to do a pivot of columns to rows and calculate frequency of j / y. There is a pivot vtab extension out there that I could possibly use, but my understanding stops here. Would appreciate any help.

(2) By punkish on 2023-02-06 16:06:35 in reply to 1.1 [source]

with Anton's (github/@nalgeon) help, I was able to install the pivot_vtab extension on my M1 Pro MacBook, and was able to create a virtual table that kinda did what I wanted. But, as Anton warned me, the extension is buggy, and when I tried to dump the data to a CSV file, I got a segmentation fault.

So, back to the drawing board. I would like to create the transposed view of my data, hopefully without involving any extension (unless I can get it to work flawlessly on both Mac -- development -- and Ubuntu -- production).

Once the pivot table is working, I would update it using triggers, as mentioned in my original post.

Many thanks in advance.

(4) By Keith Medcalf (kmedcalf) on 2023-02-06 20:55:10 in reply to 2 [link] [source]

The old version works fine for me, but the new version crashes on windows with an "oh so sorry screw you" dialog box.

sqlite> create virtual table temp.pvt using pivot_vtab((select j from t group by j order by j),(select y,y from t group by y order by y),(select count(*) from t where j=?1 and y=?2));
sqlite> .mode qbox
sqlite> select * from pvt;
┌───────┬──────┬──────┬──────┬──────┬──────┐
│   j   │ 1990 │ 1991 │ 1992 │ 1993 │ 1994 │
├───────┼──────┼──────┼──────┼──────┼──────┤
│ 'bar' │ 2    │ 0    │ 0    │ 1    │ 0    │
│ 'baz' │ 1    │ 1    │ 1    │ 0    │ 0    │
│ 'foo' │ 1    │ 3    │ 1    │ 0    │ 1    │
└───────┴──────┴──────┴──────┴──────┴──────┘
sqlite> .mode csv
sqlite> select * from pvt;
j,1990,1991,1992,1993,1994
bar,2,0,0,1,0
baz,1,1,1,0,0
foo,1,3,1,0,1
sqlite>

note that this works also

sqlite> create virtual table temp.pvt using pivot_vtab((select distinct j from t order by j),(select distinct y,y from t order by y),(select count(*) from t where j=?1 and y=?2));
sqlite> .mode qbox
sqlite> select * from pvt;
┌───────┬──────┬──────┬──────┬──────┬──────┐
│   j   │ 1990 │ 1991 │ 1992 │ 1993 │ 1994 │
├───────┼──────┼──────┼──────┼──────┼──────┤
│ 'bar' │ 2    │ 0    │ 0    │ 1    │ 0    │
│ 'baz' │ 1    │ 1    │ 1    │ 0    │ 0    │
│ 'foo' │ 1    │ 3    │ 1    │ 0    │ 1    │
└───────┴──────┴──────┴──────┴──────┴──────┘
sqlite> .mode csv
sqlite> select * from pvt;
j,1990,1991,1992,1993,1994
bar,2,0,0,1,0
baz,1,1,1,0,0
foo,1,3,1,0,1
sqlite>

(7) By Keith Medcalf (kmedcalf) on 2023-02-06 21:35:00 in reply to 4 [link] [source]

Note that if you change the columns you need to reparse the schema. pragma writable_schema=reset will accomplish that. There is no need to drop and recreate the virtual table.

(8) By nalgeon on 2023-02-06 21:45:18 in reply to 4 [link] [source]

Yeah, it seems like the memory leak in pivot_vtab was introduced in 2022. I switched to the 2021 version, and the tests passed. Thanks for noticing!

(3) By anonymous on 2023-02-06 18:54:56 in reply to 1.1 [link] [source]

Tables with varying numbers of columns isn't what SQL is good for, but if all you need is CSV, it's possible to hand-craft some.

.mode tabs
.separator ,
.headers off

with
    js as
        (select distinct j
            from t
            order by j),
    ys as
        (select distinct y
            from t
            order by y),
    summary (j, y, n) as
        (select j, y, count(id)
            from js join ys natural left join t
            group by j, y
            order by j, y)
select j, n from
    (select 0 as o,
            'j\y' as j,
            group_concat
                (case when y glob '*[,"'||char(10,13)||']*' then
                    '"'||replace(y,'"','""')||'"'
                else
                    y
                end) as n
        from ys
    union all select 1,
            case when j glob '*[,"'||char(10,13)||']*' then
               '"'||replace(j,'"','""')||'"'
            else
                j
            end,
            group_concat(n)
        from summary
        group by j)
    order by o;

(5) By jose isaias cabrera (jicman) on 2023-02-06 21:05:25 in reply to 3 [link] [source]

WOW! Very nice.

(6) By nalgeon on 2023-02-06 21:18:47 in reply to 1.1 [link] [source]

It's also possible to use an eval function available as a separate extension or as a part of the define extension:

with years as (
  select distinct(y) as year
  from t
),
lines as (
  select 'create view vt as ' as part
  union all
  select 'select j '
  union all
  select ',count(y) filter (where y = ' || year || ') as "' || year || '" '
  from years
  union all
  select 'from t group by j;'
)
select eval(group_concat(part, ''))
from lines;

After that:

select * from vt;

┌─────┬──────┬──────┬──────┬──────┬──────┐
│  j  │ 1990 │ 1991 │ 1992 │ 1994 │ 1993 │
├─────┼──────┼──────┼──────┼──────┼──────┤
│ bar │ 2    │ 0    │ 0    │ 0    │ 1    │
│ baz │ 1    │ 1    │ 1    │ 0    │ 0    │
│ foo │ 1    │ 3    │ 1    │ 1    │ 0    │
└─────┴──────┴──────┴──────┴──────┴──────┘

(9) By Keith Medcalf (kmedcalf) on 2023-02-06 22:07:28 in reply to 6 [link] [source]

Very nice! You could also put in a conditional drop of the view so that it is re-created when the query is run (you only need to do that if adding a column).

  with years(year) as
       (
           select distinct y
             from t
         order by 1
       ),
       lines(part) as
       (
           select 'drop view if exists vt; '
        union all
           select 'create view vt as '
        union all
           select 'select j'
        union all
           select ', count(y) filter (where y = ' || year || ') as "' || year || '"'
             from years
        union all
           select ' from t group by j;'
       )
select eval(group_concat(part, ''))
  from lines
;

(10) By jose isaias cabrera (jicman) on 2023-02-07 00:08:48 in reply to 9 [link] [source]

WOW! You guys are geniuses.