SQLite Forum

Main Topic and Non Main Topics (Was: Topic and subtopic)
Login

Main Topic and Non Main Topics (Was: Topic and subtopic)

(1.1) By Cecil (CecilWesterhof) on 2020-07-03 21:41:16 edited from 1.0 [link] [source]

I made a mess of it. :'-(
I explained it better in the post with id 10.

I want to use a table to define topics and subtopics.
In the records I will use a (NOT NULL) field topicID that points to the correct main topic.
I want to use a link table to link the subtopics to a record.
But it should not be possible to link the topic as a subtopic.
Can this be done in an elegant way?

When I have in topic table:
1 Work
2 Familie
3 Friend
4 Hobbies

When a record with id 17 has as topicID 1 then in the link table the following would be acceptable:
17 2
17 4

But what should not be acceptable is:
17 1

How can this best be done?

(2) By Larry Brasfield (LarryBrasfield) on 2020-07-03 15:28:10 in reply to 1.0 [link] [source]

I think before getting to a detailed solution, it would be good to know: Can a subtopic ever have subtopics? (Or is this a two-level hierarchy, forever?)

(3.1) By Keith Medcalf (kmedcalf) on 2020-07-03 15:41:31 edited from 3.0 in reply to 1.0 [link] [source]

You are putting the horse behind the buggy.

Ignoring the link table, how do you know which row in the topic table is a topic and which a subtopic and which subtopic is a subtopic of which parent topic?

You need the order Horse, Whip, Buggy. Nothing will work if you do not have your parts in the right order (that is, having the buggy in the front, followed by the whip and then the horse will not result in much of use, nor will putting the whip behind the buggy even if the horse is out front be of much use, because whipping the buggy will not make the horse go faster).

You need to deal with your topic/subtopic first, before you can even bother considering linkages to topic/subtopic. In other words you need to write the book before worrying about what the biographie that someone will use to refer to it will look like.

(4) By Larry Brasfield (LarryBrasfield) on 2020-07-03 15:58:03 in reply to 1.0 [source]

I think Keith and I are making closely related points. It really behooves software (or most other) designers to consider function carefully before going to implementation.

You probably want to create a directed, acyclic graph (aka "DAG"), with nodes represented in a single table with constraints or procedure to avoid cycles and a convention for absence of a predecessor. This is a common problem for which I am sure you can find many solutions, including database schemas.

Off-topic side-note on horse-transport:

From my observation of horses, and from many interesting family tales of their behavior and personalities, I know that if a horse wanted to get the buggy somewhere and had a place to push on it, there would be no need for a whip and the buggy would move toward that same place.

(5) By Richard Damon (RichardDamon) on 2020-07-03 16:23:56 in reply to 1.0 [link] [source]

Not having good names makes it harder to talk about such things, so let me define a few things.

A table Topic, with a Primary Key of topicID.

A second table Record, with a Primary Key of recordID, another field topicID which has (at least effectively) a Foreign Key constraint to Topic.topicID

A third table Link which has fields recordID and topicID, which are (effectively) Foreign Keys to the above table.

Your constraint is effectivly that you never want the topicID in a record of the Link Table to match the topicID in the matching row of Record as specified by recordID. This basically by definition requires doing a JOIN on the Link and Record tables to test, and I am not sure if there is any way to define a constraint that would automatically do this join to make the test, as it would be somewhat expensive to do in general.

(6.1) By Keith Medcalf (kmedcalf) on 2020-07-03 16:54:13 edited from 6.0 in reply to 5 [link] [source]

Sure there is.

You define a before trigger(s) that raise an error if you try to insert or update records such that the constraint is violated.

It would not be very efficient, but it would work.

However you still have to know HOW you define something as being valid or invalid, and we don't know that yet. Because by some magical method undisclosed there are "exclusions" at work that are not yet described (what is the mechanism by which we know that 17 3 is an invalid combination, in the example given, for example -- is it because it is a primary number? But 2 is a primary number. Is it because it is an odd number? If there were a topic 5, would that be permitted or denied? Why or why not?).

(10) By Cecil (CecilWesterhof) on 2020-07-03 21:46:38 in reply to 6.1 [link] [source]

I tried to clarify what I mend in reply with id 10.

(9) By Cecil (CecilWesterhof) on 2020-07-03 21:44:51 in reply to 5 [link] [source]

Yes, you understood what I so poorly/wrongly described.

(7) By Simon Slavin (slavin) on 2020-07-03 20:42:56 in reply to 1.0 [link] [source]

Suppose your link table has fields 'topic' and 'subtopic' and has entries

1 17 2 12 12 14

meaning that item 17 is a subtopic of item 1, etc.. Your question is about avoiding a new link of

17 1

This can checked for using a constraint on the table. But you also need to avoid each of the following for different reasons:

6 17 2 14 14 2

You can get eliminate two out of four inside SQLite just by picking a good primary key for the link table. But it's difficult to spot 14 2 inside SQLite without using a recursive WITH clause:

https://sqlite.org/lang_with.html

or by maintaining a list of subtopics inside the link table, both of which take programming. If I had to do this properly I'd do that programming inside my programming language, not inside SQLite.

(8) By Cecil (CecilWesterhof) on 2020-07-03 21:39:15 in reply to 1.0 [link] [source]

I made a mess of my description. :'-( Instead of topic and subtopcs I should have use Main Topic and Non Main Topics.

I have tasks. Every task has a Main Topic and optionally Non Main Topics. For example a task can be primarily connected to work, but also be related to Familie and Hobbies. So the link table contains the above records, but cannot contain: 17 1 because 1 is already filled in the Main Topic.

A record with id 18 can have a topicID of 4 and the link table contains: 18 1 because the Main Topic is Hobbies and there is also a Non Main Topic Work. But it cannot contain: 18 4, because 4 is the Main Topic and cannot be a Non Main Topic.

I hope I am clear enough now, but do not hesitate to ask.

(11) By Richard Damon (RichardDamon) on 2020-07-03 23:08:35 in reply to 8 [link] [source]

There is another organizational method which would do the constraint differently, and would ALWAYS add the main topic item into the list, and maybe remove it when you pull the list of 'non-main' topics for a given record.

This makes it easier to do the (what I presume a major purpose of the non-main topic table) search for all records that are at least somewhat (Mainly or non-mainly) connected to a given topic.

Unless there really is a need for a lot of searches for records ONLY non-mainly topically related, adding te Main Topic to the table may simplify the code.

Yes, there will likely be a need in the UI to define a record a desire to not list the Main topic, but that is easily removed with a join and a where clause.

Now the primary restriction might be a prohibition of not listing a given topicID twice for a given record, but that is a simple unique constraint on the column pair.

(13) By Cecil (CecilWesterhof) on 2020-07-04 16:45:59 in reply to 11 [link] [source]

I think I will be most often only be interested in the Main Topic. So I think the other way is better.

(15) By Richard Damon (RichardDamon) on 2020-07-04 17:07:38 in reply to 13 [link] [source]

IF you are only interested in the main topic, then this other related topics table doesn't matter. The question comes, when you DO want to use this table, are you normally going to want to include the main topic in the results or not. Generally for this sort of thing, most of the time you want the main topic included, and the cases where you want it excluded, you are also for the operation getting the main topic, so removing it from the other related topics isn't hard to do.

(12.2) By Keith Medcalf (kmedcalf) on 2020-07-04 00:48:23 edited from 12.1 in reply to 8 [link] [source]

Ah, ok. That makes more sense. So lets have tables that look like this:

create table Topic
(
  id     integer primary key,
  name   text collate nocase unique
);
create table Data
(
  id          integer primary key,
  MainTopicID integer not null references Topic(id)
);
create index DataTopic on Data(mainTopicID);
create table AdditionalTopics
(
  DataID     integer not null references Data(id),
  SubTopicID integer not null references Topic(id),
  PRIMARY KEY (DataID, SubTopicID),
  UNIQUE (SubTopicID, DataID)
) WITHOUT ROWID;

All these enforce the basics. You need to make sure foreign key enforcement is turned on for every connection by issuing the command pragma foreign_keys=1 to each and every connection without fail. If you do not do this, then your foreign key constraints will not be enforced and you can "muck up" your referential integrity (or you can compile your application with foreign keys turned on by default).

Then you need an "insert trigger" to stop you from inserting bad data, and update triggers to stop you from updating bad data, and delete triggers to stop you from creating inconsistencies.

create trigger ATI before insert on AdditionalTopics
begin
  select raise(ABORT, 'Cannot set MainTopic as SubTopic')
   where new.SubTopicID == (
                            select MainTopicID 
                              from Data 
                             where id == new.DataID
                           );
end;
create trigger ATU before update on AdditionalTopics
begin
  select raise(ABORT, 'Cannot set MainTopic as SubTopic')
   where new.SubTopicID == (
                            select MainTopicID 
                              from Data 
                             where id == new.DataID
                           );
end;
create trigger TD before delete on Topic
begin
  select raise(ABORT, 'Cannot Delete Topic that is in Use')
    where exists (
                  select 1
                    from AdditionalTopics
                   where SubTopicID == old.id
                 );
end;
create trigger TU before update of id on Topic
begin
  select raise(ABORT, 'Cannot Renumber Topic that is in Use')
    where exists (
                  select 1
                    from AdditionalTopics
                   where SubTopicID == old.id
                 );
end;

** Edited to fix typographic errors **
** Edited again because the update trigger should fire always **

(14) By Cecil (CecilWesterhof) on 2020-07-04 16:48:22 in reply to 12.2 [link] [source]

Looks very promising. I am going to play with it. I do not understand the following part: PRIMARY KEY (DataID, SubTopicID), UNIQUE (SubTopicID, DataID)

Is the UNIQUE part not already done by the PRIMARY KEY part?

(16.2) By Keith Medcalf (kmedcalf) on 2020-07-04 19:40:37 edited from 16.1 in reply to 14 [link] [source]

Because they are in the "other order".

You need to have an index on the foreign key columns. There can be "other stuff" in those indexes. The WITHOUT ROWID makes the table "just an index" and you need indexes on both columns and in both directions DataID <-> SubTopicID because (other than the requirement to have indexes of foreign keys) you will presumably eventually need to do lookups in BOTH directions (as in "find me all the data records where the main topic or a secondary topic is 'Work'):

Personally, I would put all the topics belonging to a Data in a link table and forget "denormalizing" one of them onto the Data record. It adds great complication later for no material benefit. And then you do not need triggers to enforce integrity.

create table Topic
(
  id     integer primary key,
  name   text collate nocase unique
);

create table Data
(
  id          integer primary key
  -- other data
);

create table DataTopics
(
  DataID     integer not null references Data(id),
  TopicID    integer not null references Topic(id),
  IsMain     integer not null check (IsMain in (0, 1)),
  PRIMARY KEY (DataID, TopicID, IsMain),
  UNIQUE (TopicID, DataID, IsMain)
) WITHOUT ROWID;
create unique index MainDataTopics on DataTopics(DataID) where TopicID == 1;

(17) By Cecil (CecilWesterhof) on 2020-07-04 20:34:11 in reply to 16.2 [link] [source]

Of-course. Why did I not think about that? :'-( Putting the UNIQUE back.

Should in the last line TopicID not be IsMain?

In this case we still need a trigger to enforce integrity, because we need to be sure that every data record has exactly one main record.

(18.3) By Keith Medcalf (kmedcalf) on 2020-07-04 23:10:45 edited from 18.2 in reply to 17 [link] [source]

Yes, you are correct. We want to make sure that the "DataID" which have IsMain == 1 are unique -- that is, there can only be 1 IsMain == 1 for each DataID.

This of course means that if you want to "change" which topic is the main topic, you have to either delete or update ismain to 0 before you can change a different one to 1 (or insert one).

You could add Insert and Update triggers that made sure that if you inserted or updated something with IsMain == 1, that the previous record for that DataID had its IsMain changed to 0. The whole process can be automated with triggers so that the IsMain more or less takes care of itself ...

create table Topic
(
  id     integer primary key,
  name   text collate nocase unique
);

create table Data
(
  id          integer primary key
  -- other data
);

create table DataTopics
(
  DataID     integer not null references Data(id),
  TopicID    integer not null references Topic(id),
  IsMain     integer not null on conflict replace default 0 check (IsMain in (0, 1)),
  PRIMARY KEY (DataID, TopicID, IsMain),
  UNIQUE (TopicID, DataID, IsMain)
) WITHOUT ROWID;
create unique index MainDataTopics on DataTopics(DataID) where IsMain == 1;

-- make sure no previous ismain == 1 if inserting ismain == 1
create trigger DataTopicsInsertB before insert on DataTopics when new.IsMain == 1
begin
  update DataTopics
     set IsMain = 0
   where DataID == new.DataID
     and IsMain == 1;
end;

-- if inserting ismain == 0 (default) change inserted record to ismain == 1 if there are no records for this dataid with ismain == 1
create trigger DataTopicsInsertA after insert on DataTopics when new.IsMain == 0
begin
  update DataTopics
     set IsMain == 1
    where DataID == new.DataID
      and TopicID == new.TopicID
      and (
           select count(*)
             from DataTopics
            where DataID == new.DataID
              and IsMain == 1
          ) == 0;
end;

-- when changing an ismain == 0 record to ismain == 1 make sure previous ismain == 1 are reset to 0
create trigger DataTopicsUpdate before update on DataTopics when old.IsMain == 0 and new.IsMain == 1
begin
  update DataTopics
     set IsMain = 0
   where DataID == new.DataID
     and IsMain == 1;
end;

This should allow you to forgo the extra conditional unique index, but I would keep it just in case anyway until I was sure that everything was working as I expected ...

Note the syntax of the definition of IsMain contains:

IsMain integer not null on conflict replace default 0 check (IsMain in (0, 1))

This means that the value should not be null, but if you try to insert a null (thus violating the constraint) the value is replaced with the default 0. The additional check constraint makes sure that the values that get stored are either 0 or 1.

This means that you can do this:

insert into DataTopics (DataID, TopicID) values (1, 1);

and the value of IsMain defaults to 0 if not specified.

Note that you can specify a conflict clause for a NOT NULL constraint but not for a CHECK constraint. https://sqlite.org/syntax/column-constraint.html

Adding additional triggers (after update, after delete) could ensure that there is always an IsMain == 1 for each DataID after the update, but how to choose which one to promote to IsMain == 1 is problematic (unless you maybe added a timestamp to the DataTopics table so you can choose the least recently added to promote, or something like that -- or even make it a rowid table so you could choose the one with the lowest rowid, which is basically the same thing).

(19.1) By Keith Medcalf (kmedcalf) on 2020-07-05 00:37:49 edited from 19.0 in reply to 17 [link] [source]

Ok, so here we are, somewhat simpler. Uses a rowid table for the DataTopics. It will ensure that there is always an IsMain==1 for every record. If you do something such that this becomes untrue, the "oldest" topic becomes the main topic. All the "management" of IsMain is "automagic" if you leave it out of insert statements, and if you specify it (as is specify IsMain = 1) then that will have "force" and previous IsMain for that Data will be reset.

create table Topic
(
  id     integer primary key,
  name   text collate nocase unique
);

create table Data
(
  id          integer primary key
  -- other data
);

create table DataTopics
(
  DataID     integer not null references Data(id),
  TopicID    integer not null references Topic(id),
  IsMain     integer not null on conflict replace default 0 check (IsMain in (0, 1)),
  PRIMARY KEY (DataID, TopicID),
  UNIQUE (TopicID, DataID)
);
create index MainDataTopics on DataTopics(DataID) where IsMain == 1;

-- if inserting ismain == 0 (default) change inserted record to ismain == 1 if there is no IsMain == 1 record
create trigger DataTopicsInsert0 after insert on DataTopics when new.IsMain == 0
begin
  update DataTopics
     set IsMain == 1
   where DataID == new.DataID
     and TopicID == new.TopicID
     and not exists (
                     select 1
                       from DataTopics
                      where DataID == new.DataID
                        and IsMain == 1
                    );
end;

-- if inserting ismain == 1 change previous ismain to 0
create trigger DataTopicsInsert1 after insert on DataTopics when new.IsMain == 1
begin
  update DataTopics
     set IsMain == 0
   where DataID == new.DataID
     and TopicID != new.TopicID
     and IsMain == 1;
end;

-- when changing to IsMain == 1 then reset any other IsMain == 1 to 0
create trigger DataTopicsUpdate1 after update on DataTopics when old.IsMain == 0 and new.IsMain == 1
begin
  update DataTopics
     set IsMain = 0
   where DataID == new.DataID
     and TopicID != new.TopicID
     and IsMain == 1;
end;

-- when changing to IsMain == 0 make sure we have an IsMain, if not use the oldest one
create trigger DataTopicUpdate0 after update on DataTopics when old.IsMain == 1 and new.IsMain == 0
begin
  update DataTopics
     set IsMain = 1
   where RowID == (
                      select min(rowid)
                        from DataTopics
                       where DataID == old.DataID
                    order by rowid
                       limit 1
                  )
     and not exists (
                     select 1
                       from DataTopics
                      where DataID == new.DataID
                        and IsMain == 1
                    );
end;

-- Make sure we always have IsMain record after a delete
create trigger DataTopicsDelete after delete on DataTopics when old.IsMain == 1
begin
  update DataTopics
     set IsMain = 1
   where RowID == (
                      select min(rowid)
                        from DataTopics
                       where DataID == old.DataID
                    order by rowid
                       limit 1
                  );
end;

Note that the previous versions may not work correctly because they included IsMain in the unique indexes and it should not have been. Sorry 'bout that. It also works with recursive triggers turned on.

(20) By Cecil (CecilWesterhof) on 2020-07-05 14:20:12 in reply to 19.1 [link] [source]

At the moment I am working on a solution with the Main Topic in the 'Data' table. I already started on it and I think it is better in my current case.
But I will look into your solution also. You never know if it is useful in another case and it never hurts to dabble.
(I once implemented Fibonacci Numbers with streams, just for the sake of it.)

I really do not like the last part, where after a delete the oldest becomes the new main.

I was wondering about the isMain in the UNIQUE indexes.


I will share my dabbles later on.

(21) By Cecil (CecilWesterhof) on 2020-07-05 22:05:26 in reply to 8 [link] [source]

I created three tables:
- topics
- toDos
- toDosTopics

For the moment I choose for the variant where the Main Topic is in my toDos table and the rest in the link table.
I only needed three triggers to keep my db consistent. (I think.)

Comment in my code:
# This is a prove of concept.
# I wanted a db with todo's. Todo's have a main topic and zero or more
# non main topics.
# I create the tables with triggers and execute a serie of checks.
# Necessary checks:
# - On toDos:
#   1.01 Insert with wrong topic
#   1.02 Insert duplicate todo
#   1.03 Update main topic to a non main topic
#   1.04 Update to wrong topic
#   1.05 Delete while still refered from toDosTopics
# - On topics:
#   2.01 Insert duplicate
#   2.02 Delete used in toDos
#   2.03 Delete used in toDosTopics
#   2.04 Update id used in toDos
#   2.05 Update id used in toDosTopics
# - On toDosTopics:
#   3.01 Insert with wrong toDoID
#   3.02 Insert with wrong topicID
#   3.03 Insert with main topic
#   3.04 Insert already used topic
#   3.05 Update used to wrong toDoID
#   3.06 Update used to wrong topicID
#   3.07 Update to main topic
#   3.08 Update to already used topic


Did I cover everything, or am I still missing things that I should check?

(22) By Cecil (CecilWesterhof) on 2020-07-06 11:03:55 in reply to 21 [link] [source]