SQLite Forum

cycle detection in CHECK constraint with recursive CTE
Login

cycle detection in CHECK constraint with recursive CTE

(1) By anonymous on 2021-07-16 18:02:52 [link]

I have a table (schema generated via an ORM) with a hierarchical child-has-parent relationship:

```sql
CREATE TABLE IF NOT EXISTS "tag" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, 
"name" varchar(40) NOT NULL UNIQUE
);

CREATE TABLE IF NOT EXISTS "tag_parents" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"from_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED,
"to_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED
);
```

I wanted to add a CHECK constraint that prevents cycles from being added to the database:

```sql
CONSTRAINT no_cycles CHECK (
   NOT EXISTS (
      WITH RECURSIVE p(root_id, end_id) AS (
          SELECT DISTINCT to_tag_id AS root_id, to_tag_id AS end_id FROM tag_parents WHERE NOT EXISTS (SELECT 1 FROM tag_parents AS s WHERE s.from_tag_id=to_tag_id)
          UNION ALL
          SELECT t.from_tag_id, t.to_tag_id FROM tag_parents AS t
          JOIN p ON t.to_tag_id = p.root_id)
      SELECT 1 FROM p WHERE t=from_tag_id AND f=to_tag_id
    )
)
```

But predictably CREATE with the constraint added fails with `Error: no such table: tag_parents` because of the self reference.

Is detecting a cycle in a constraint possible?

(2) By Domingo (mingodad) on 2021-07-17 08:51:16 in reply to 1 [link]

I think you can use "CHECK" for that:

CREATE TABLE IF NOT EXISTS "tag_parents" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"from_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED,
"to_tag_id" integer NOT NULL REFERENCES "tag" ("id") DEFERRABLE INITIALLY DEFERRED CHECK(from_tag_id <> to_tag_id)
);

(3) By anonymous on 2021-07-17 13:07:37 in reply to 2 [link]

Hello Domingo,

Unfortunately that won't work for cycles with size of more than 2 vertices.

(4) By Domingo (mingodad) on 2021-07-17 13:55:25 in reply to 3 [link]

You are right !
What about a trigger on insert/update ?

(5) By Domingo (mingodad) on 2021-07-17 13:59:55 in reply to 4 [link]

Here is stackoverflow question that seems to refer to your problem https://stackoverflow.com/questions/39555616/detect-cycles-using-sql

(6) By anonymous on 2021-07-17 14:46:33 in reply to 5 [link]

For some reason I did not think about triggers. If I get a working result I shall post it here. thank you!

(7) By anonymous on 2021-07-20 10:24:41 in reply to 5 [link]

Hello, here's is the trigger code I ended up using:

```sql
CREATE TRIGGER cycle_check
BEFORE INSERT ON tag_parents
FOR EACH ROW
BEGIN
    SELECT RAISE(ABORT, 'Cycle detected') WHERE EXISTS (
      WITH RECURSIVE w(parent, last_visited, already_visited, cycle) AS (
            SELECT DISTINCT to_tag_id AS parent, from_tag_id AS last_visited, to_tag_id AS already_visited, 0 AS cycle FROM tag_parents

            UNION ALL

            SELECT t.to_tag_id AS parent, t.from_tag_id AS last_visited, already_visited || ', ' || t.to_tag_id, already_visited LIKE '%'||t.to_tag_id||"%" FROM tag_parents AS t JOIN w ON w.last_visited = t.to_tag_id
            WHERE NOT cycle
      )
      SELECT already_visited, cycle FROM w WHERE last_visited = NEW.to_tag_id AND already_visited LIKE '%'||NEW.from_tag_id||'%'
    );
END;
```

I hope it proves useful (and correct) for anyone with a similar problem. I made a small writeup [here](https://sic.nessuent.xyz/s/18/detecting-cycles-in-tag-parent-child-relationship-in-sqlite3/) with a tiny bit more detail if anyone is interested.

(8) By ddevienne on 2021-07-20 12:02:59 in reply to 7 [link]

FWIW, `WHERE EXISTS` often uses a `SELECT 1 FROM ... WHERE ...` query, because it does not matter what you select,  
it's the mere fact a single row exists that matters.  So selecting anything else is just *wasting cycles* :).  
Not that it matters performance-wise, most likely, so just an FYI.

(13) By anonymous on 2021-07-20 15:38:30 in reply to 8 [link]

True, that was leftover from a previous iteration where I was trying to show the cycle in the raise abort message. Selecting 1 would be the best, non-confusing choice.

(9.1) By ddevienne on 2021-07-20 14:32:51 edited from 9.0 in reply to 1 [link]

I think your schema is to blame. If you want a DAG, then use a single table, with a self-referential parent.

Most DBMS's implement ***immediate*** Foreign-Keys by default, <s>unlike SQLite.</s> (see correction in follow-up posts)  
So with a table such as:
```
CREATE TABLE IF NOT EXISTS "tag" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, 
  "name" varchar(40) NOT NULL UNIQUE,
  "parent" integer REFERENCES "tag" ("id"),
  UNIQUE(parent, name) -- or UNIQUE(name), depends what you want
);
```
You cannot event insert a cycle with that schema, ensuring a DAG.

With deferred FKs, <s>the only mode in SQLite</s> (see later posts), you then need to resort  
to a TRIGGER to detect cycles, but that trigger can be much faster,  
since starting from the leaves, instead of the roots like yours above.  
It only has to detect a duplicate ID when traversing from the one leaf  
(that fired the trigger) to its root, which is log(N).

Maybe I missed something, and the above is wrong. I'll know soon I guess :)

PS: AFAIK, a CHECK constraint can only access the local row, not do an arbitrary query,   
so that approach was flawed from the get go I believe. Happy to be corrected on that.

(10) By David Raymond (dvdraymond) on 2021-07-20 13:48:03 in reply to 9.0 [link]

> Most DBMS's implement immediate Foreign-Keys by default, unlike SQLite. 

> With deferred FKs, the only mode in SQLite...

No, immediate is the default and you have to specify "deferrable initially deferred" to get deferred FKs.

[Deferred Foreign Key Constraints](https://www.sqlite.org/foreignkeys.html#fk_deferred)

(Or I'm just misunderstanding what you're saying)

(11.2) By ddevienne on 2021-07-20 14:31:25 edited from 11.1 in reply to 10 [link]

Well, this looks like deferred to me, no?

```
C:\Users\ddevienne>sqlite3
SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table tag (id integer primary key, name text, parent integer references tag (id), UNIQUE(parent, name));
sqlite> begin;
sqlite> insert into tag values (1, "leaf", 0);
sqlite> insert into tag values (0, "root", null);
sqlite> commit;
sqlite> select * from tag order by id;
0|root|
1|leaf|0
sqlite>
```

Update: Rah, I always forget about `pragma foreign_keys = 1;` !!!
So you are right indeed. My bad!

```
C:\Users\ddevienne>sqlite3
SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> pragma foreign_keys = 1;
sqlite> create table tag (id integer primary key, name text, parent integer references tag (id), UNIQUE(parent, name));
sqlite> begin;
sqlite> insert into tag values (1, "leaf", 0);
Error: FOREIGN KEY constraint failed
```

(12) By anonymous on 2021-07-20 15:36:44 in reply to 9.1

Thank you, you're completely right, this is the appropriate approach for what you suggest. However, I wanted multiple parents in my setup (many-to-many relationship).

As for the schema itself, it was generated by django ORM. Perhaps in the future I can replace it with a handcrafted one.

(14) By ddevienne on 2021-07-20 16:02:35 in reply to 12 [link]

OK, sure. What I propose is a pure TREE, a simpler form of DAG indeed.  
But still, what you have will not scale well. At least index the from/to tag,  
and probably also have a `UNIQUE(from, to)` contraints too.

That way the `WHERE from = ...` queries will range-scan that auto-index,  
and either the `WHERE to = ...` ones will skip-scan it using that same index,  
or create an explicit (non-UNIQUE) index on it. Look at the plans.

You probably also want to ON DELETE CASCADE in your FKs, to auto-delete edges,  
when tags are removed. Unless your ORM does that in code instead?

And you also don't need an `id` column for edges, but I suspect it's your ORM adding it, right?  
I'd make the edges table a `WITHOUT ROWID` table, and have the PK be `(from, to)`,  
effectively replacing the UNIQUE-index I proposed above.

Still, I somehow feel there's a better way to do what you want, I just don't see it now. Maybe I'm just imagining things.

PS: Also, don't forget to enable `pragma foreign_keys = 1`, unlike me!

(15) By anonymous on 2021-07-20 16:15:53 in reply to 14 [link]

> Still, I somehow feel there's a better way to do what you want, I just don't see it now. Maybe I'm just imagining things.

If you think of something feel free to post it! :) I'd be glad to read it.