Shell .mode column width auto adjustment

(1) By Tony Papadimitriou (tonyp) on 2020-05-13 18:48:34 updated by 1.1 [source]

I don't know if this issue has come up before but here it goes.

Currently, (for `.mode column` at least) there seems to be a very simple 'heuristic' calculation of a column's width based on the first row's contents.

This has at least two problems:

  1.  If the first row's column widths are not representative of the majority of rows (i.e., either much shorter or much longer than average width), the 'wrong' widths are used.

  2.  An identical columns query result will end up with different column widths if only ordered differently (for example).

With ad-hoc queries, normally one adjusts column widths with the `.width` shell command, manually.  (I don't consider the script case here as there is no problem there to have pre-configured `.width` commands all over the script.)

The problem with it is that for each different query a new adjusted `.width` command has to be issued.

To illustrate:

```
.mode column

create table t(s varchar(30));
insert into t values('A longer value ending here!');        -- length 27
insert into t values('Short value!');                       -- length 12

select * from t order by length(s) desc;
-- s
-- ---------------------------
-- A longer value ending here!
-- Short value!

select * from t order by length(s);
-- s
-- ------------
-- Short value!
-- A longer val
```

As you can see the second result has truncated much of the column's content.

For comparison, MySQL produces this:

```
+-----------------------------+
| s                           |
+-----------------------------+
| A longer value ending here! |
| Short value!                |
+-----------------------------+
2 rows in set (0.01 sec)

+-----------------------------+
| s                           |
+-----------------------------+
| Short value!                |
| A longer value ending here! |
+-----------------------------+
2 rows in set (0.13 sec)
```

_MySQL seems to have a solution that works quite well.  I don't know the internals of how they do it so I'm purely guessing here that it keeps track of each column's max content length (not the definition but the actual content) as rows are fetched during query processing._

Would this (not the actual method of doing it but the effect) be something that SQLite3 library could eventually support?

Otherwise, a couple of suggestions for poor man's alternatives:

1.  If a length is specified for a column (such as `varchar(10)` or `int(5)` which is ignored anyway by SQLite3 so there cannot be any compatibility issues), this could become a hint for the expected width of the column overriding the currently otherwise heuristically assumed width.

    I think this way it will become very simple to get visually consistent query results for the same columns (at least for simply queries).

    I understand there are cases where it becomes more involved (like `a  `\|\|` b` where widths could probably be added), or derived view columns, which makes the 'count-as-you-go' method the only [?] possibility.  But, at least wherever the heuristic can work the results will be more visually pleasing than the current way.

2. Buffer a few rows (e.g., 10) at the shell level before displaying the results and calculate the columns widths based on their content average/max length.

Thanks for reading this far!

Shell .mode column width auto adjustment

(1.1) By Tony Papadimitriou (tonyp) on 2020-05-13 18:51:29 edited from 1.0 updated by 1.2 [link] [source]

I don't know if this issue has come up before but here it goes.

Currently, (for `.mode column` at least) there seems to be a very simple 'heuristic' calculation of a column's width based on the first row's contents.

This has at least two problems:

  1.  If the first row's column widths are not representative of the majority of rows (i.e., either much shorter or much longer than average width), the 'wrong' widths are used.

  2.  An identical columns query result will end up with different column widths if only ordered differently (for example).

With ad-hoc queries, normally one adjusts column widths with the `.width` shell command, manually.  (I don't consider the script case here as there is no problem there to have pre-configured `.width` commands all over the script.)

The problem with it is that for each different query a new adjusted `.width` command has to be issued.

To illustrate:

```
.mode column

create table t(s varchar(30));
insert into t values('A longer value ending here!');        -- length 27
insert into t values('Short value!');                       -- length 12

select * from t order by length(s) desc;
-- s
-- ---------------------------
-- A longer value ending here!
-- Short value!

select * from t order by length(s);
-- s
-- ------------
-- Short value!
-- A longer val
```

As you can see the second result has truncated much of the column's content.

For comparison, MySQL produces this:

```
+-----------------------------+
| s                           |
+-----------------------------+
| A longer value ending here! |
| Short value!                |
+-----------------------------+
2 rows in set (0.01 sec)

+-----------------------------+
| s                           |
+-----------------------------+
| Short value!                |
| A longer value ending here! |
+-----------------------------+
2 rows in set (0.13 sec)
```

_MySQL seems to have a solution that works quite well.  I don't know the internals of how they do it so I'm purely guessing here that it keeps track of each column's max content length (not the definition but the actual content) as rows are fetched during query processing._

Would this (not the actual method of doing it but the effect) be something that SQLite3 library could eventually support?

Otherwise, a couple of suggestions for poor man's alternatives:

1.  If a length is specified for a column (such as `varchar(10)` or `int(5)` which is ignored anyway by SQLite3 so there cannot be any compatibility issues), this could become a hint for the expected width of the column overriding the currently otherwise heuristically assumed width.

    I think this way it will become very simple to get visually consistent query results for the same columns (at least for simple queries) without the need for `.width`.

    I understand there are cases where it becomes more involved (like `a  `\|\|` b` where widths could probably be added), or derived view columns, which makes the 'count-as-you-go' method the only [?] possibility.  But, at least wherever the heuristic can work the results will be more visually pleasing than the current way.

2. Buffer a few rows (e.g., 10) at the shell level before displaying the results and calculate the columns widths based on their content average/max length.

Thanks for reading this far!

Shell .mode column width auto adjustment

(1.2) By Tony Papadimitriou (tonyp) on 2020-05-13 18:52:31 edited from 1.1 updated by 1.3 [link] [source]

I don't know if this issue has come up before but here it goes.

Currently, (for `.mode column` at least) there seems to be a very simple 'heuristic' calculation of a column's width based on the first row's contents.

This has at least two problems:

  1.  If the first row's column widths are not representative of the majority of rows (i.e., either much shorter or much longer than average width), the 'wrong' widths are used.

  2.  An identical columns query result will end up with different column widths if only ordered differently (for example).

With ad-hoc queries, normally one adjusts column widths with the `.width` shell command, manually.  (I don't consider the script case here as there is no problem there to have pre-configured `.width` commands all over the script.)

The problem with it is that for each different query a new adjusted `.width` command has to be issued.

To illustrate:

```
.mode column

create table t(s varchar(30));
insert into t values('A longer value ending here!');        -- length 27
insert into t values('Short value!');                       -- length 12

select * from t order by length(s) desc;
-- s
-- ---------------------------
-- A longer value ending here!
-- Short value!

select * from t order by length(s);
-- s
-- ------------
-- Short value!
-- A longer val
```

As you can see the second result has truncated much of the column's content.

For comparison, MySQL produces this:

```
+-----------------------------+
| s                           |
+-----------------------------+
| A longer value ending here! |
| Short value!                |
+-----------------------------+
2 rows in set (0.01 sec)

+-----------------------------+
| s                           |
+-----------------------------+
| Short value!                |
| A longer value ending here! |
+-----------------------------+
2 rows in set (0.13 sec)
```

_MySQL seems to have a solution that works quite well.  I don't know the internals of how they do it so I'm purely guessing here that it keeps track of each column's max content length (not the definition but the actual content) as rows are fetched during query processing._

Would this (not the actual method of doing it but the effect) be something that SQLite3 library could eventually support?

Otherwise, a couple of suggestions for poor man's alternatives:

1.  If a length is specified for a column (such as `varchar(10)` or `int(5)` which is ignored anyway by SQLite3 so there cannot be any compatibility issues), this could become a hint for the expected width of the column overriding the currently otherwise heuristically assumed width.

    I think this way it will become very simple to get visually consistent query results for the same columns (at least for simple queries) without the need for `.width`.

    I understand there are cases where it becomes more involved (like `a  `\|\|` b` where widths could probably be added), or derived view columns, which makes the 'count-as-you-go' method the only [?] possibility.  But, at least wherever the heuristic can work the results will be more visually pleasing than the current way.

2. Buffer a few rows (e.g., 10) at the shell level before displaying the results and calculate the columns' widths based on their content average/max length.

Thanks for reading this far!

Shell .mode column width auto adjustment

(1.3) By Tony Papadimitriou (tonyp) on 2020-05-13 18:55:15 edited from 1.2 [link] [source]

I don't know if this issue has come up before but here it goes.

Currently, (for .mode column at least) there seems to be a very simple 'heuristic' calculation of a column's width based on the first row's contents.

This has at least two problems:

If the first row's column widths are not representative of the majority of rows (i.e., either much shorter or much longer than average width), the 'wrong' widths are used.
An identical columns query result will end up with different column widths if only ordered differently (for example).

With ad-hoc queries, normally one adjusts column widths with the .width shell command, manually. (I don't consider the script case here as there is no problem there to have pre-configured .width commands all over the script.)

The problem with it is that for each different query a new adjusted .width command has to be issued.

To illustrate:

.mode column

create table t(s varchar(30));
insert into t values('A longer value ending here!');        -- length 27
insert into t values('Short value!');                       -- length 12

select * from t order by length(s) desc;
-- s
-- ---------------------------
-- A longer value ending here!
-- Short value!

select * from t order by length(s);
-- s
-- ------------
-- Short value!
-- A longer val

As you can see the second result has truncated much of the column's content.

For comparison, MySQL produces this:

+-----------------------------+
| s                           |
+-----------------------------+
| A longer value ending here! |
| Short value!                |
+-----------------------------+
2 rows in set (0.01 sec)

+-----------------------------+
| s                           |
+-----------------------------+
| Short value!                |
| A longer value ending here! |
+-----------------------------+
2 rows in set (0.13 sec)

MySQL seems to have a solution that works quite well. I don't know the internals of how they do it so I'm purely guessing here that it keeps track of each column's max content length (not the definition but the actual content) as rows are fetched/generated during query processing.

Would this (not the actual method of doing it but the effect) be something that SQLite3 library could eventually support?

Otherwise, a couple of suggestions for poor man's alternatives:

If a length is specified for a column (such as varchar(10) or int(5) which is ignored anyway by SQLite3 so there cannot be any compatibility issues), this could become a hint for the expected width of the column overriding the currently otherwise heuristically assumed width.

I think this way it will become very simple to get visually consistent query results for the same columns (at least for simple queries) without the need for .width.

I understand there are cases where it becomes more involved (like a||b where widths could probably be added), or derived view columns, which makes the 'count-as-you-go' method the only [?] possibility. But, at least wherever the heuristic can work the results will be more visually pleasing than the current way.
Buffer a few rows (e.g., 10) at the shell level before displaying the results and calculate the columns' widths based on their content average/max length.

Thanks for reading this far!