Foreign key joins

(1) By Joel Jakobsson (purrimjau) on 2021-12-26 08:24:07 [link] [source]

Hi,

I have a suggestion on how to improve the way we write SQL JOINs, by making use of foreign keys, to avoid having to explicitly specify the referencing/referenced tables and the column(s) to join on.

Consider the example query below, taken from the Grafana project:

SELECT
    p.*,
    ? AS resource_id,
    ur.user_id AS user_id,
    u.login AS user_login,
    u.email AS user_email,
    tr.team_id AS team_id,
    t.name AS team,
    t.email AS team_email,
    r.name as role_name
FROM permission p
    LEFT JOIN role r ON p.role_id = r.id
    LEFT JOIN team_role tr ON r.id = tr.role_id
    LEFT JOIN team t ON tr.team_id = t.id
    LEFT JOIN user_role ur ON r.id = ur.role_id
    LEFT JOIN user u ON ur.user_id = u.id
WHERE p.id = ?

If the foreign keys had been given the same names as the referenced tables, the FROM clause of the query could then be rewritten like this:

    FROM permission p
    LEFT JOIN role r KEY p->role
    LEFT JOIN team_role tr KEY tr->role
    LEFT JOIN team t KEY tr->team
    LEFT JOIN user_role ur KEY ur->role
    LEFT JOIN "user" u KEY ur->user
    WHERE p.id = ?;

Possible benefits:

Eliminate risk of joining on the wrong columns Although probably an uncommon class of bugs, a join can be made on the wrong columns, which could go undetected if the desired row is included by coincidence, such as if the test environment might only contain a single row in some table, and the join condition happened to be always true. By joining using the foreign key, it can be verified at compile time, that the referenced_table_alias is actually an alias for the table referenced by the foreign key. If some other alias would be given, an error would be thrown, to avoid failure.
Conciser syntax In a traditional join, you have to explicitly state all columns for the referencing and referenced table. I think writing joins feels like you are repeating the same table aliases and column names over and over again, all the time. This is especially true for multiple-column joins. This is somewhat addressed by the USING join form, but USING has other drawbacks, why I tend to avoid it except for one-off queries. When having to use fully-qualified table aliases, that adds even further to the verboseness.
Makes abnormal joins stand out If joining on something else than foreign key columns, or some inequality expression, such joins will continue to be written in the traditional way, and will therefore stand out and be more visible, if all other foreign key-based joins are written using the new syntax. When reading SQL queries, I think this would be a great improvement, since the boring normal joins on foreign keys could be given less attention, and focus could instead be made on making sure you understand the more complex joins.
Explicit direction of the join In a traditional join on foreign key columns, it's not possible to derive if the join is a one-to-many or many-to-one join, by just looking at the SQL code itself. One must also know/inspect the data model or make assumptions based on the naming of columns and tables. This is perhaps the least interesting benefit though, since good naming makes the direction quite obvious anyway. But I think it at least reduces the total cognitive load of reading a SQL query.

Thoughts?

(2) By anonymous on 2021-12-26 14:19:37 in reply to 1 [link] [source]

I did not see -> listed as an operator in SQLite's documentation.

Where does it come from, and what do you intend it to mean?

(6) By Joel Jakobsson (purrimjau) on 2021-12-27 20:07:07 in reply to 2 [link] [source]

I agree -> is not a good choice. I will work on a new syntax proposal.

(3) By Keith Medcalf (kmedcalf) on 2021-12-26 19:46:13 in reply to 1 [link] [source]

Back in the olden days when dinosaurs roamed the earth database navigation worked the way you are describing. By navigating from one entity to another via a set there was no need to know which columns were "relational" -- because the database was not relational.

Relational databases are based on set algebra and projections based on data operations (comparisons). Foreign keys are constraints on contents, not descriptors of set (which do no exist in the relational model) equivalence.

If you want a navigable database (such as IMS, IDMS, MDBS, or db-Vista (to name just a few)) there is nothing preventing you from using such a database rather than a relational detabase.

I would also note that although you seem to think your mode of expression is succinct, I at least find it utterly confusing. It is massively complex and requires the consideration of facts not in evidence in order to decode.

There is, however, nothing preventing you from creating a transmorgifier which turns your conception into standard SQL which can be processed by every database implementing an SQL interface (no matter the underlying organization).

(8) By Joel Jakobsson (purrimjau) on 2021-12-27 20:39:47 in reply to 3 [link] [source]

Foreign Keys (FKs) were not part of the original SQL-86 standard, they were first added in SQL-89, and since then they have not yet been considered to be used for anything else than enforcing referential integrity.

SQL might be an old language, but it keeps evolving, so maybe we can innovate in this area.

I agree the syntax I proposed was not very nice, it was merely something to start the discussion.

I will work on a new proposal.

(4) By Holger J (holgerj) on 2021-12-27 12:28:36 in reply to 1 [link] [source]

Basically, that's what can be done by USING (columns).

The precondition is that columns with the same name have to have the same meaning. Unfortunately, some frameworks force you to name the primary key column "id" no matter what.

If you have a well written schema fulfilling the above mentioned precondition, the following would work:

    FROM permission p
    LEFT JOIN role r         USING(role_id)
    LEFT JOIN team_role tr   USING(role_id)
    LEFT JOIN team t         USING(team_id)
    LEFT JOIN user_role ur   USING(role_id)
    LEFT JOIN "user" u       USING(user_id)
    WHERE p.id = ?;

Btw, calling a column "user" which is a reserved keyword is a bad idea and forces you to use double quotes around the name.

(5) By Keith Medcalf (kmedcalf) on 2021-12-27 18:10:22 in reply to 4 [link] [source]

user is not a reserved keyword -- not only is it not reserved, it is not a keyword either.

(10) By Holger J (holgerj) on 2021-12-28 12:33:56 in reply to 5 [source]

It is, documented here

https://www.postgresql.org/docs/current/sql-keywords-appendix.html

Not only for PostgreSQL, but for the SQL standards since 1992 as well.

(7) By Joel Jakobsson (purrimjau) on 2021-12-27 20:21:16 in reply to 4 [link] [source]

Yes, USING is a bit similar, but it's less concise since you need to specify the column names. There are also serious problems with USING [1] why I personally think it should only be used for one-off queries, and never for important queries in production code. But that all depends on how paranoid you are.

I also agree "user" is a very poorly chosen name for a table. I didn't write the example query myself, it comes from the Grafana project. I just wanted some external query to avoid personal bias in the example given.

[1] http://www.databasesoup.com/2013/08/fancy-sql-monday-on-vs-natural-join-vs.html

(9) By Joel Jakobsson (purrimjau) on 2021-12-27 20:40:31 in reply to 1 [link] [source]

Here comes a new proposal:

SYNTAX

join_type JOIN KEY fk_table.fk_name [ [ AS ] alias ]

Follow foreign key on a table already in the join, and joins in the referenced table, on the columns as specified by the foreign key.

If no alias is given, fk_name is used as the alias name.

join_type JOIN fk_table [ [ AS ] alias ] KEY fk_name REF pk_table

Join in a new table on the columns as specified by the foreign key, against the existing table as specified by the pk_table alias.

EXAMPLE

sql SELECT * FROM permission p LEFT JOIN KEY p.role r LEFT JOIN team_role tr KEY role REF r LEFT JOIN KEY tr.team t LEFT JOIN user_role ur KEY role REF r LEFT JOIN KEY ur.user u WHERE p.id = 1;`

Thoughts?