SQLite User Forum

Interesting paper from Google on pipe syntax on SQL
Login

Interesting paper from Google on pipe syntax on SQL

(1) By Sebastian LaVine (smlavine) on 2024-08-24 17:24:49 [link] [source]

(2) By Bo Lindbergh (_blgl_) on 2024-08-25 00:20:46 in reply to 1 [link] [source]

Deliberately not improving insert, update, and delete is disappointing.

(3) By Dave Mausner (dmausner) on 2024-08-25 13:55:52 in reply to 1 [link] [source]

It's an interesting paper. Serious academic papers don't describe themselves as "great!", but consider the source.

I observe that programmers have always understood that compilers convert the syntax language into an internal semantic language for execution. This does not invalidate their solution.

The problem of successive table result processing syntax can be reduced by the use of cascading (that is, nested) views which pass all the needed column expressions down the line. This reduces syntax complexity, too, letting sqlite do the ugly work.

That's what computers do. I don't feel sorry for compilers.

(4) By Dave Mausner (dmausner) on 2024-08-25 14:04:43 in reply to 3 [link] [source]

Commenting upon myself, I readily admit that it's kewl to take a table result and pipe it to PIVOT, then down the lane.

(5) By Adrian Ho (lexfiend) on 2024-08-25 16:32:55 in reply to 1 [link] [source]

I think the main selling point of pipe syntax is improved ability to reason about complex queries. Having semantic evaluation order match syntactic order is already a significant improvement, and I imagine the resulting dataflow-like mental paradigm sits better with most folks.

(6) By Richard Hipp (drh) on 2024-08-25 20:11:58 in reply to 1 [link] [source]

I do not like the pipe operator (|>). Apparently, I am not the first to offer this criticism. The authors list 11 reasons why they think the pipe operator is useful in section 4.1.4. I remain unconvinced.

(7) By Alex Garcia (alexgarciaxyz) on 2024-08-26 00:33:35 in reply to 6 [link] [source]

I also don't like the pipe syntax very much, but I think there is some value in an optional re-order of SELECT and FROM clauses. Like having the SELECT clause come after FROM, like:

FROM users
SELECT first_name, last_name
WHERE id = 123;

Or even have the SELECT be entirely optional, like FROM users WHERE id = 123 (where SELECT * becomes the default).

I think this paper and other "compile-to-SQL" languages like PRQL and Logica and Malloy offer some syntax sugar that really don't have sense in pre-existing SQL databases like SQLite. But all of them have a "FROM-first" mentality, which fits a lot of developer's mental models of how SQL queries are ran.

DuckDB offers FROM-first in their SQL, though not sure if other SQL databases also do this.

(8) By Richard Hipp (drh) on 2024-08-26 02:58:33 in reply to 7 [link] [source]

Yes, Figures 1 and 2 on page 2 nicely illustrate how FROM-first better matches the semantics of the statement. It also illustrates how WHERE can serve as a WHERE or a HAVING depending on whether it occurs before or after the AGGREGATE step, respectively, which is nice. Section 5.3 points out that FROM-first queries have the "prefix property" that any prefix of the query is also a valid query. This allows you to test out your query incrementally.

(9) By Richard Hipp (drh) on 2024-08-26 14:26:10 in reply to 1 [source]

For a limited time only, there is a bootleg https://sqlite.org/fiddle on-line that supports some of the proposed "pipe" syntax outlined in the Google paper. This is but a quick prototype. You can expect to find bugs.

Differences from the syntax described in the paper:

  • The "|>" operator is optional, except before the AS pipe operator. Before AS, you have to include |> to avoid a syntax ambiguity. If (like me) you dislike the |> operator, you can say "VIEW <name>" instead of "AS <name>" to accomplish the same thing and the |> operator is not required before VIEW.

  • The following pipe operators are not implemented: EXTEND, SET, DROP, CALL, TABLESAMPLE, PIVOT, UNPIVOT.

(10) By Domingo (mingodad) on 2024-08-26 15:50:21 in reply to 9 [link] [source]

I've just added the grammar changes to allow a limited usage of PIPE operator that Richard did here https://mingodad.github.io/parsertl-playground/playground/ (select "SQLite3 PIPES modified parser (partially working)" from "Examples" then click `Parse` to see a parse tree for the contents in "Input source").

Also there there is the zetasql grammar with PIPES (select "ZetaSQL parser (be patient)(partially working)" from "Examples").

You can edit the grammar and click "Parse" to see the results, it's an in browser YACC/LEX interpreter via wasm.

(11) By Stéphane Aulery (saulery) on 2024-08-26 15:59:06 in reply to 1 [link] [source]

If you want to explore a better query language there is the Third Manifesto and Tutorial D by Chris Date.

(12) By anonymous on 2024-08-26 16:06:22 in reply to 11 [link] [source]

I thought about mentioning that one, since it was discussed in the ML a long time ago. Someone was doing a modern take on Tutorial D, if I remember right. But I couldn't find the reference in searching the forum, so I gave up... Perhaps because it's too old, pre-forum, and was on the ML only.

(13) By anonymous on 2024-08-26 21:27:34 in reply to 9 [link] [source]

Thanks for the fiddle.

notes while experimenting:

  1. While exploring the "pipe" syntax it might be useful to have some sort less verbose CAST() so that when enforcing affinity/datatypes (i.e. when expressions are in the pipes) the CAST() doesn't make reading the query more difficult. Because other SQL implementations don't use SQLite's flexible datatypes/affinity the Google folks won't necessarily raise the awkward CAST() in their paper.

  2. For EXTEND, "select *, cast([expr] as [type]) as [column]" was used.

  3. For SET, all columns are individually specified (except for the "SET" column), and the EXTEND substitution was used.

  4. For DROP, again, all columns individually specified except for the "DROP" column.

  5. There may be some push through optimisations (as opposed to the push down optimisations) such as dropping columns that end up not being used.

  6. operational note (operator error): When writing ad hoc SQL, missing a ; between statements will take some getting used to. For example several statements in the fiddle, perhaps meant to run individually will now pipe to each other rather than give an error. For example:

FROM (select 2 as id) /* without ; */ select 1 as id

All in all, IMAO (In My Anonymous Opinion) piping seems to worth a look.

(14) By Richard Hipp (drh) on 2024-08-26 22:58:53 in reply to 13 [link] [source]

Worth a look, maybe, but not worth landing on trunk.

My goal is to keep SQLite relevant and viable through the year 2050. That's a long time from now. If I knew that standard SQL was not going to change any between now and then, I'd go ahead and make non-standard extensions that allowed for FROM-clause-first queries, as that seems like a useful extension. The problem is that standard SQL will not remain static. Probably some future version of "standard SQL" will support some kind of FROM-clause-first query format. I need to ensure that whatever SQLite supports will be compatible with the standard, whenever it drops. And the only way to do that is to support nothing until after the standard appears.

When will that happen? A month? A year? Ten years? Who knows.

I'll probably take my cue from PostgreSQL. If PostgreSQL adds support for FROM-clause-first queries, then I'll do the same with SQLite, copying the PostgreSQL syntax. Until then, I'm afraid you are stuck with only traditional SELECT-first queries in SQLite.

I'm not overly impressed with the Google pipe syntax. They talk a good game in the paper. They sound really convincing. But once I started actually looking for use cases and messing with it, I, for one, lost interest in their design. I have hundreds, perhaps thousands, of queries in the Fossil implementation which I went trolling through, and I didn't find any that would be improved by Google pipe syntax over just having FROM-first queries. Maybe Google is more focused on analytical queries against big data. Maybe the pipe syntax works better for really gnarly analytical queries. It doesn't seem to help for the kinds of queries I do in Fossil using SQLite, though, at least not that I've seen.

Doing the prototype implementation and working through the issues was a useful exercise. I now know that SQLite can be easily converted to support Google pipe syntax and/or FORM-first queries, should the need arise, simply by adding a few new production rules to the LALR(1) grammar, and with no changes to the query planner/optimize. But the present need is insufficient to overcome the future incompatibility risk.

(15) By anonymous on 2024-09-01 21:50:41 in reply to 14 [link] [source]

I thought the most helpful part of the proposed syntax is removing sub queries which are annoying to read and write. Multiple aggregations just listed one after another is much easier to reason about them having each one inside a previous one.

From first seems more like icing than the main improvement IMO.

(16) By anonymous on 2024-09-01 23:48:51 in reply to 14 [link] [source]

Thanks, and appreciated the sound reasoning.

Yet the 2050 goal does highlight a mystery; 2050 sure seems farther away than it did just a few years ago :)

(17) By ingo on 2024-09-02 04:58:42 in reply to 7 [link] [source]

in process tech one often goes backwards through the proces(step). In step N one wants the result ABC. So the previous step(s) M has to deliver X,Y,Z. Create ABC with X,Y,Z, this is the goal. It's not we have X,Y,Z, what shall we make with it or let's make ABC with it? If you want to model multiple processes(/steps) use CTE's these come before the final one, but each step inside the CTE's follows the select from / create with rule.

(18) By anonymous on 2024-09-02 05:06:07 in reply to 15 [link] [source]

Hmm... I must disagree, I find sub-queries very useful, and a way to simplify the overall query into (almost) procedural steps.

But I like procedural languages, so admit to an underlying preference (if you agree with me - or bias if you don't)

(19) By Warren Young (wyoung) on 2024-09-02 14:30:46 in reply to 14 [link] [source]

I need to ensure that whatever SQLite supports will be compatible with the standard

As long as SQLite remains in a poor position to pioneer new syntax and ram it through the standards process, this is a smart position.

And I don't think we want your time spent playing ANSI/ISO games.

That said, if we can step away from practicalities and speak in idealities for a time…

It doesn't seem to help for the kinds of queries I do in Fossil using SQLite

I first encountered the pipe operator1 in the F# programming language for .NET about 8 years ago, and it's great for any case where you find yourself with chains of operations where the output of one is sent immediately to the input of the next. That makes it inherently an imperative or functional paradigm.

Yet, it does seem to fit well with SQL, a declarative language, for CTEs if nothing else. Any time you find yourself reading the code inside out or right to left, you may be reaching for a pipe operator without realizing it.

The canonical anti-pipe syntax is Lisp, where it's both inside-out and right-to-left.

The paper barely mentions F#, but this is one of the major language improvements it brought over OCaml, of which it is a spiritual derivative. The Python-like "light" syntax Microsoft made the default2 about a decade ago was one of the primary innovations over OCaml, to the point that OCaml added it soon after as an experimental feature, then put it in the standard language.

Since then it has appeared elsewhere.3 Surprisingly, one of these places is C++20, where the "ranges" feature has landed. Here's a beautiful example from cppreference.com[2]:

#include <iostream>
#include <ranges>
 
int main()
{
    auto const ints = {0, 1, 2, 3, 4, 5};
    auto even = [](int i) { return 0 == i % 2; };
    auto square = [](int i) { return i * i; };
 
    // the "pipe" syntax of composing the views:
    for (int i : ints | std::views::filter(even) | std::views::transform(square))
        std::cout << i << ' ';
 
    std::cout << '\n';
 
    // a traditional "functional" composing syntax:
    for (int i : std::views::transform(std::views::filter(ints, even), square))
        std::cout << i << ' ';
}

That's darn near F# syntax with noisy curly braces and semicolons sprinkled about.

Whatever you think about indents vs curlies for block scoping, though, consider the readability of the two examples. The first loop takes a list4 of integers, sends it through a filter to pick the ones desired, and operates on the result to make the list concrete. The second is a classic example of the right-to-left alternative I've derided above.

To bring this back on-topic for this forum, the shortest I can get to that core loop in SQLite is:

CREATE TABLE a(a);
INSERT INTO a VALUES (0), (1), (2), (3), (4), (5);
SELECT (a * a) FROM (
    SELECT a FROM a WHERE a % 2 == 0
);

I think it should be legal to get rid of the temporary table like so, but SQLite doesn't like the use of AS here:

SELECT (a * a) FROM (
    SELECT a FROM (
        VALUES (0), (1), (2), (3), (4), (5)
    ) AS a(a) WHERE a % 2 == 0
);

Regardless, C++20 is coming within shouting distance of SQL here for expressiveness and arguably offers better clarity for those of us who grew up reading left-to-right human languages.

If that doesn't impress, I don't know what will.


  1. ^ …with the same syntax, |>
  2. ^ …but added much earlier as an option
  3. ^ The paper mentions R, where they've made it ugly5 as they so often do. People deride JavaScript for being weird, but go learn R and learn what "weird" looks like!
  4. ^ …or more technically, a std::initializer_list<int>
  5. ^ %>%

(20) By ddevienne on 2024-09-02 15:44:02 in reply to 19 [link] [source]

Gleam is a recent functional language with the pipeline operator too.

import gleam/io
import gleam/string

pub fn main() {
  // Without the pipe operator
  io.debug(string.drop_left(string.drop_right("Hello, Joe!", 1), 7))

  // With the pipe operator
  "Hello, Mike!"
  |> string.drop_right(1)
  |> string.drop_left(7)
  |> io.debug

  // Changing order with function capturing
  "1"
  |> string.append("2")
  |> string.append("3", _)
  |> io.debug
}

which outputs:

"Joe"
"Mike"
"312"

(21) By Warren Young (wyoung) on 2024-09-02 16:47:55 in reply to 20 [link] [source]

F#-flavored Erlang; nice!

Too bad it has to run on the BEAM. What I want is a container-friendly statically-compiled variant of F# for systems programming. The closest I've found so far is Rust, which is pretty darn far from the mark.

From what I've seen, Swift 6 is technically poised to step into this role, but when I find zero courses for "Swift on Linux" on Udemy, I know the community isn't focusing on that.

To return to the topic, Swift also has a pipe operator spelled |>.

(22) By Rowan Worth (sqweek) on 2024-09-04 04:39:03 in reply to 21 [link] [source]

Julia also spells it |>, or .|> for element-wise application (as opposed to applying a function to the container type itself):

julia> ints=[0,1,2,3,4,5]
6-element Array{Int64,1}:
  ...
julia> even(x) = (x%2)==0
julia> square(x) = x*x
julia> ints |> x -> filter(even, x) .|> square
3-element Array{Int64,1}:
  0
  4
 16

filter doesn't plug in perfectly hence the lambda function (x -> «function-body»), and if you use |> instead of .|> on the last step you get a long and somewhat confusing method error because julia doesn't define an array multiplication operator. The . prefix is used consistently to express element-wise operations though, ie.:

# trying to multiply two arrays together
julia> ints * ints
ERROR: MethodError: no method matching *(::Array{Int64,1}, ::Array{Int64,1})
   ...

# multiply each element of array together
julia> ints .* ints
6-element Array{Int64,1}:
  0
  1
  4
  9
 16
 25

# it does define scalar multiplication with arrays, eg. multiply array by two
julia> 2 * ints
6-element Array{Int64,1}:
  0
  2
  4
  6
  8
 10

# or equivalently, multiply each array element by two
julia> 2 .* ints
6-element Array{Int64,1}:
  0
  2
  4
  6
  8
 10

Some collection types (eg. matrices) do define a multiplication operator -- in the matrix case the semantics are of course vastly different to element-wise multiplication.

I'm not sure which parts of statically-compiled are important to you; Julia does compile to native code but it does so at runtime rather than producing an independent binary ahead of time. Also it's focused on scientific applications... I mostly use it as an interactive calculator so not sure how it holds up in the realm of systems programming :)