SQLite Forum

cycle detection in CHECK constraint with recursive CTE
Login

cycle detection in CHECK constraint with recursive CTE

(1) By anonymous on 2021-07-16 18:02:52 [link] [source]

I have a table (schema generated via an ORM) with a hierarchical child-has-parent relationship:

CREATE TABLE IF NOT EXISTS "tag" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, 
"name" varchar(40) NOT NULL UNIQUE
);

CREATE TABLE IF NOT EXISTS "tag_parents" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"from_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED,
"to_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED
);

I wanted to add a CHECK constraint that prevents cycles from being added to the database:

CONSTRAINT no_cycles CHECK (
   NOT EXISTS (
      WITH RECURSIVE p(root_id, end_id) AS (
          SELECT DISTINCT to_tag_id AS root_id, to_tag_id AS end_id FROM tag_parents WHERE NOT EXISTS (SELECT 1 FROM tag_parents AS s WHERE s.from_tag_id=to_tag_id)
          UNION ALL
          SELECT t.from_tag_id, t.to_tag_id FROM tag_parents AS t
          JOIN p ON t.to_tag_id = p.root_id)
      SELECT 1 FROM p WHERE t=from_tag_id AND f=to_tag_id
    )
)

But predictably CREATE with the constraint added fails with Error: no such table: tag_parents because of the self reference.

Is detecting a cycle in a constraint possible?

(2) By Domingo (mingodad) on 2021-07-17 08:51:16 in reply to 1 [link] [source]

I think you can use "CHECK" for that:

CREATE TABLE IF NOT EXISTS "tag_parents" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"from_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED,
"to_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED CHECK(from_tag_id <> to_tag_id)
);

(3) By anonymous on 2021-07-17 13:07:37 in reply to 2 [link] [source]

Hello Domingo,

Unfortunately that won't work for cycles with size of more than 2 vertices.

(4) By Domingo (mingodad) on 2021-07-17 13:55:25 in reply to 3 [link] [source]

You are right !
What about a trigger on insert/update ?

(5) By Domingo (mingodad) on 2021-07-17 13:59:55 in reply to 4 [link] [source]

Here is stackoverflow question that seems to refer to your problem https://stackoverflow.com/questions/39555616/detect-cycles-using-sql

(6) By anonymous on 2021-07-17 14:46:33 in reply to 5 [link] [source]

For some reason I did not think about triggers. If I get a working result I shall post it here. thank you!

(7) By anonymous on 2021-07-20 10:24:41 in reply to 5 [link] [source]

Hello, here's is the trigger code I ended up using:

CREATE TRIGGER cycle_check
BEFORE INSERT ON tag_parents
FOR EACH ROW
BEGIN
    SELECT RAISE(ABORT, 'Cycle detected') WHERE EXISTS (
      WITH RECURSIVE w(parent, last_visited, already_visited, cycle) AS (
            SELECT DISTINCT to_tag_id AS parent, from_tag_id AS last_visited, to_tag_id AS already_visited, 0 AS cycle FROM tag_parents

            UNION ALL

            SELECT t.to_tag_id AS parent, t.from_tag_id AS last_visited, already_visited || ', ' || t.to_tag_id, already_visited LIKE '%'||t.to_tag_id||"%" FROM tag_parents AS t JOIN w ON w.last_visited = t.to_tag_id
            WHERE NOT cycle
      )
      SELECT already_visited, cycle FROM w WHERE last_visited = NEW.to_tag_id AND already_visited LIKE '%'||NEW.from_tag_id||'%'
    );
END;

I hope it proves useful (and correct) for anyone with a similar problem. I made a small writeup here with a tiny bit more detail if anyone is interested.

(8) By ddevienne on 2021-07-20 12:02:59 in reply to 7 [link] [source]

FWIW, WHERE EXISTS often uses a SELECT 1 FROM ... WHERE ... query, because it does not matter what you select,
it's the mere fact a single row exists that matters. So selecting anything else is just wasting cycles :).
Not that it matters performance-wise, most likely, so just an FYI.

(13) By anonymous on 2021-07-20 15:38:30 in reply to 8 [link] [source]

True, that was leftover from a previous iteration where I was trying to show the cycle in the raise abort message. Selecting 1 would be the best, non-confusing choice.

(9.1) By ddevienne on 2021-07-20 14:32:51 edited from 9.0 in reply to 1 [link] [source]

I think your schema is to blame. If you want a DAG, then use a single table, with a self-referential parent.

Most DBMS's implement immediate Foreign-Keys by default, unlike SQLite. (see correction in follow-up posts)
So with a table such as:

CREATE TABLE IF NOT EXISTS "tag" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, 
  "name" varchar(40) NOT NULL UNIQUE,
  "parent" integer REFERENCES "tag" ("id"),
  UNIQUE(parent, name) -- or UNIQUE(name), depends what you want
);
You cannot event insert a cycle with that schema, ensuring a DAG.

With deferred FKs, the only mode in SQLite (see later posts), you then need to resort
to a TRIGGER to detect cycles, but that trigger can be much faster,
since starting from the leaves, instead of the roots like yours above.
It only has to detect a duplicate ID when traversing from the one leaf
(that fired the trigger) to its root, which is log(N).

Maybe I missed something, and the above is wrong. I'll know soon I guess :)

PS: AFAIK, a CHECK constraint can only access the local row, not do an arbitrary query,
so that approach was flawed from the get go I believe. Happy to be corrected on that.

(10) By David Raymond (dvdraymond) on 2021-07-20 13:48:03 in reply to 9.0 [source]

Most DBMS's implement immediate Foreign-Keys by default, unlike SQLite.

With deferred FKs, the only mode in SQLite...

No, immediate is the default and you have to specify "deferrable initially deferred" to get deferred FKs.

Deferred Foreign Key Constraints

(Or I'm just misunderstanding what you're saying)

(11.2) By ddevienne on 2021-07-20 14:31:25 edited from 11.1 in reply to 10 [link] [source]

Well, this looks like deferred to me, no?

C:\Users\ddevienne>sqlite3
SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table tag (id integer primary key, name text, parent integer references tag (id), UNIQUE(parent, name));
sqlite> begin;
sqlite> insert into tag values (1, "leaf", 0);
sqlite> insert into tag values (0, "root", null);
sqlite> commit;
sqlite> select * from tag order by id;
0|root|
1|leaf|0
sqlite>

Update: Rah, I always forget about pragma foreign_keys = 1; !!! So you are right indeed. My bad!

C:\Users\ddevienne>sqlite3
SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> pragma foreign_keys = 1;
sqlite> create table tag (id integer primary key, name text, parent integer references tag (id), UNIQUE(parent, name));
sqlite> begin;
sqlite> insert into tag values (1, "leaf", 0);
Error: FOREIGN KEY constraint failed

(12) By anonymous on 2021-07-20 15:36:44 in reply to 9.1 [link] [source]

Thank you, you're completely right, this is the appropriate approach for what you suggest. However, I wanted multiple parents in my setup (many-to-many relationship).

As for the schema itself, it was generated by django ORM. Perhaps in the future I can replace it with a handcrafted one.

(14) By ddevienne on 2021-07-20 16:02:35 in reply to 12 [link] [source]

OK, sure. What I propose is a pure TREE, a simpler form of DAG indeed.
But still, what you have will not scale well. At least index the from/to tag,
and probably also have a UNIQUE(from, to) contraints too.

That way the WHERE from = ... queries will range-scan that auto-index,
and either the WHERE to = ... ones will skip-scan it using that same index,
or create an explicit (non-UNIQUE) index on it. Look at the plans.

You probably also want to ON DELETE CASCADE in your FKs, to auto-delete edges,
when tags are removed. Unless your ORM does that in code instead?

And you also don't need an id column for edges, but I suspect it's your ORM adding it, right?
I'd make the edges table a WITHOUT ROWID table, and have the PK be (from, to),
effectively replacing the UNIQUE-index I proposed above.

Still, I somehow feel there's a better way to do what you want, I just don't see it now. Maybe I'm just imagining things.

PS: Also, don't forget to enable pragma foreign_keys = 1, unlike me!

(15) By anonymous on 2021-07-20 16:15:53 in reply to 14 [link] [source]

Still, I somehow feel there's a better way to do what you want, I just don't see it now. Maybe I'm just imagining things.

If you think of something feel free to post it! :) I'd be glad to read it.