cycle detection in CHECK constraint with recursive CTE
(1) By anonymous on 2021-07-16 18:02:52 [link]
I have a table (schema generated via an ORM) with a hierarchical child-has-parent relationship: ```sql CREATE TABLE IF NOT EXISTS "tag" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(40) NOT NULL UNIQUE ); CREATE TABLE IF NOT EXISTS "tag_parents" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "from_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED, "to_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED ); ``` I wanted to add a CHECK constraint that prevents cycles from being added to the database: ```sql CONSTRAINT no_cycles CHECK ( NOT EXISTS ( WITH RECURSIVE p(root_id, end_id) AS ( SELECT DISTINCT to_tag_id AS root_id, to_tag_id AS end_id FROM tag_parents WHERE NOT EXISTS (SELECT 1 FROM tag_parents AS s WHERE s.from_tag_id=to_tag_id) UNION ALL SELECT t.from_tag_id, t.to_tag_id FROM tag_parents AS t JOIN p ON t.to_tag_id = p.root_id) SELECT 1 FROM p WHERE t=from_tag_id AND f=to_tag_id ) ) ``` But predictably CREATE with the constraint added fails with `Error: no such table: tag_parents` because of the self reference. Is detecting a cycle in a constraint possible?
(2) By Domingo (mingodad) on 2021-07-17 08:51:16 in reply to 1 [link]
I think you can use "CHECK" for that: CREATE TABLE IF NOT EXISTS "tag_parents" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "from_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED, "to_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED CHECK(from_tag_id <> to_tag_id) );
(3) By anonymous on 2021-07-17 13:07:37 in reply to 2 [link]
Hello Domingo, Unfortunately that won't work for cycles with size of more than 2 vertices.
(4) By Domingo (mingodad) on 2021-07-17 13:55:25 in reply to 3 [link]
You are right ! What about a trigger on insert/update ?
(5) By Domingo (mingodad) on 2021-07-17 13:59:55 in reply to 4
Here is stackoverflow question that seems to refer to your problem https://stackoverflow.com/questions/39555616/detect-cycles-using-sql
(6) By anonymous on 2021-07-17 14:46:33 in reply to 5 [link]
For some reason I did not think about triggers. If I get a working result I shall post it here. thank you!
(7) By anonymous on 2021-07-20 10:24:41 in reply to 5 [link]
Hello, here's is the trigger code I ended up using: ```sql CREATE TRIGGER cycle_check BEFORE INSERT ON tag_parents FOR EACH ROW BEGIN SELECT RAISE(ABORT, 'Cycle detected') WHERE EXISTS ( WITH RECURSIVE w(parent, last_visited, already_visited, cycle) AS ( SELECT DISTINCT to_tag_id AS parent, from_tag_id AS last_visited, to_tag_id AS already_visited, 0 AS cycle FROM tag_parents UNION ALL SELECT t.to_tag_id AS parent, t.from_tag_id AS last_visited, already_visited || ', ' || t.to_tag_id, already_visited LIKE '%'||t.to_tag_id||"%" FROM tag_parents AS t JOIN w ON w.last_visited = t.to_tag_id WHERE NOT cycle ) SELECT already_visited, cycle FROM w WHERE last_visited = NEW.to_tag_id AND already_visited LIKE '%'||NEW.from_tag_id||'%' ); END; ``` I hope it proves useful (and correct) for anyone with a similar problem. I made a small writeup [here](https://sic.nessuent.xyz/s/18/detecting-cycles-in-tag-parent-child-relationship-in-sqlite3/) with a tiny bit more detail if anyone is interested.
(8) By ddevienne on 2021-07-20 12:02:59 in reply to 7 [link]
FWIW, `WHERE EXISTS` often uses a `SELECT 1 FROM ... WHERE ...` query, because it does not matter what you select, it's the mere fact a single row exists that matters. So selecting anything else is just *wasting cycles* :). Not that it matters performance-wise, most likely, so just an FYI.
(9.1) By ddevienne on 2021-07-20 14:32:51 edited from 9.0 in reply to 1 [link]
I think your schema is to blame. If you want a DAG, then use a single table, with a self-referential parent. Most DBMS's implement ***immediate*** Foreign-Keys by default, <s>unlike SQLite.</s> (see correction in follow-up posts) So with a table such as: ``` CREATE TABLE IF NOT EXISTS "tag" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(40) NOT NULL UNIQUE, "parent" integer REFERENCES "tag" ("id"), UNIQUE(parent, name) -- or UNIQUE(name), depends what you want ); ``` You cannot event insert a cycle with that schema, ensuring a DAG. With deferred FKs, <s>the only mode in SQLite</s> (see later posts), you then need to resort to a TRIGGER to detect cycles, but that trigger can be much faster, since starting from the leaves, instead of the roots like yours above. It only has to detect a duplicate ID when traversing from the one leaf (that fired the trigger) to its root, which is log(N). Maybe I missed something, and the above is wrong. I'll know soon I guess :) PS: AFAIK, a CHECK constraint can only access the local row, not do an arbitrary query, so that approach was flawed from the get go I believe. Happy to be corrected on that.
(10) By David Raymond (dvdraymond) on 2021-07-20 13:48:03 in reply to 9.0 [link]
> Most DBMS's implement immediate Foreign-Keys by default, unlike SQLite. > With deferred FKs, the only mode in SQLite... No, immediate is the default and you have to specify "deferrable initially deferred" to get deferred FKs. [Deferred Foreign Key Constraints](https://www.sqlite.org/foreignkeys.html#fk_deferred) (Or I'm just misunderstanding what you're saying)
(11.2) By ddevienne on 2021-07-20 14:31:25 edited from 11.1 in reply to 10 [link]
Well, this looks like deferred to me, no? ``` C:\Users\ddevienne>sqlite3 SQLite version 3.33.0 2020-08-14 13:23:32 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> create table tag (id integer primary key, name text, parent integer references tag (id), UNIQUE(parent, name)); sqlite> begin; sqlite> insert into tag values (1, "leaf", 0); sqlite> insert into tag values (0, "root", null); sqlite> commit; sqlite> select * from tag order by id; 0|root| 1|leaf|0 sqlite> ``` Update: Rah, I always forget about `pragma foreign_keys = 1;` !!! So you are right indeed. My bad! ``` C:\Users\ddevienne>sqlite3 SQLite version 3.33.0 2020-08-14 13:23:32 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> pragma foreign_keys = 1; sqlite> create table tag (id integer primary key, name text, parent integer references tag (id), UNIQUE(parent, name)); sqlite> begin; sqlite> insert into tag values (1, "leaf", 0); Error: FOREIGN KEY constraint failed ```
(12) By anonymous on 2021-07-20 15:36:44 in reply to 9.1 [link]
Thank you, you're completely right, this is the appropriate approach for what you suggest. However, I wanted multiple parents in my setup (many-to-many relationship). As for the schema itself, it was generated by django ORM. Perhaps in the future I can replace it with a handcrafted one.
(13) By anonymous on 2021-07-20 15:38:30 in reply to 8 [link]
True, that was leftover from a previous iteration where I was trying to show the cycle in the raise abort message. Selecting 1 would be the best, non-confusing choice.
(14) By ddevienne on 2021-07-20 16:02:35 in reply to 12 [link]
OK, sure. What I propose is a pure TREE, a simpler form of DAG indeed. But still, what you have will not scale well. At least index the from/to tag, and probably also have a `UNIQUE(from, to)` contraints too. That way the `WHERE from = ...` queries will range-scan that auto-index, and either the `WHERE to = ...` ones will skip-scan it using that same index, or create an explicit (non-UNIQUE) index on it. Look at the plans. You probably also want to ON DELETE CASCADE in your FKs, to auto-delete edges, when tags are removed. Unless your ORM does that in code instead? And you also don't need an `id` column for edges, but I suspect it's your ORM adding it, right? I'd make the edges table a `WITHOUT ROWID` table, and have the PK be `(from, to)`, effectively replacing the UNIQUE-index I proposed above. Still, I somehow feel there's a better way to do what you want, I just don't see it now. Maybe I'm just imagining things. PS: Also, don't forget to enable `pragma foreign_keys = 1`, unlike me!
(15) By anonymous on 2021-07-20 16:15:53 in reply to 14 [link]
> Still, I somehow feel there's a better way to do what you want, I just don't see it now. Maybe I'm just imagining things. If you think of something feel free to post it! :) I'd be glad to read it.