How to implement a table working like a ring buffer

(1) By Heribert (derhexer) on 2020-10-21 18:04:53 [link] [source]

ID	GroupID	Ack
1	100	0
2	200	1
3	200	0
4	300	0
5	300	0
6	200	0
7	200	0

I want to implement a table working like a ring buffer. If it grows to the max row count, i want delete the "oldest row" and then insert a new row. A) ID grows from 1 to n

But there are some conditions which defines the "oldest row".

A row is not the "oldest row", if it is the last row of all rows with same GroupID.
A row is not the "oldest row", if its Ack is 1.
A row is the "oldest row", if 1) and 2) fits and if its ID is the lowest of all possible "oldest row" candidates which results from grouping by GroupId.
If delete is necessary delete only the "oldest row" (one).

In the upper sample table the "oldest row" is ID = 3 and will be deleted if necessary.

To achive this, i want to use a trigger (before insert) to do this 'deleting' automatically. The condition, when a delete has do be done, works.

I've tried to solve this problem by myself. But now i'm stucked.

Is there anyone who can give me hint?

Thx a lot for any help! heribert

(2) By Keith Medcalf (kmedcalf) on 2020-10-21 19:58:37 in reply to 1 [link] [source]

ID is the lowest of all possible "oldest row" candidates which results from grouping by GroupId

does not make any sense. If it is actually relevant please explain.

So to delete only the one minimum row that is not excluded by the above when there will be than 6 rows in the table and preferring performance over disk usage, the following would work:

create table t
(
    ID      integer primary key,
    GroupID integer not null,
    Ack     integer not null check (Ack in (0,1))
);
create unique index t_Deleteable_ID on t (ID, GroupID) where Ack != 1;
create index t_GroupID on t (GroupID);

create trigger t_bi_prune before insert on t
begin
  delete from t
   where ID = (
                 select ID
                   from t as o
                  where (select count(ID) from t where GroupID == o.GroupID) > 1
                    and Ack != 1
               order by ID
                  limit 1
              )
     and (
          select count(ID)
            from t
         ) >= 6;
end;

insert into t values (1, 100, 0), (2, 200, 1), (3, 200, 0), (4, 300, 0), (5, 300, 0), (6, 200, 0), (7, 200, 0);

from which you can see that the insertion of the last item resulted in the deletion of ID == 3.

Whether further performance optimization is required depends on the shape of your data.

(3) By Heribert (derhexer) on 2020-10-21 21:00:23 in reply to 2 [link] [source]

You are right - Grouping by GroupId makes no sense. That was my way to solve this problem... The reason why i stucked...

Thx a lot. It works perfect!

Now i have to understand why it works :)

Thx again! U saved my day!

(4.1) By Heribert (derhexer) on 2020-10-21 22:19:58 edited from 4.0 in reply to 3 [link] [source]

Deleted

(5) By Heribert (derhexer) on 2020-10-21 22:19:48 in reply to 4.0 [link] [source]

I had made a mistake in my description.

ID	GroupID	Ack
1	100	0
2	200	1
3	200	0
4	300	0
5	300	0
6	400	0
7	400	0

2.0) A row is not the "oldest row", if its Ack is 1.

2.1) A row is not the "oldest row", if the previous row of his goup (same GroupId) has Ack == 1 and this row is the last with Ack==0.

So: In the upper sample table the "oldest row" is ID = 4 and will be deleted if necessary (Because: ID=3 is part GroupId 200 and previous group member (ID=2) has Ack==1 AND ID=3 is the last of GroupId 200 with Ack==0. So ID=3 is not the "oldest row").

I attempt to solve this on the solution of Keith, without any success.

(7) By Keith Medcalf (kmedcalf) on 2020-10-21 23:02:02 in reply to 5 [link] [source]

You mean like this:

create table t
(
    ID      integer primary key,
    GroupID integer not null,
    Ack     integer not null check (Ack in (0,1))
);
create unique index t_Deleteable_ID on t (ID, GroupID) where Ack != 1;
create index t_GroupID on t (GroupID) where Ack != 1;

create trigger t_bi_prune before insert on t
begin
  delete from t
   where ID = (
                 select ID
                   from t as o
                  where (select count(ID) from t where GroupID == o.GroupID and Ack != 1) > 1
                    and Ack != 1
               order by ID
                  limit 1
              )
     and (
          select count(ID)
            from t
         ) >= 6;
end;

insert into t values (1, 100, 0), (2, 200, 1), (3, 200, 0), (4, 300, 0), (5, 300, 0), (6, 400, 0), (7, 400, 0);

The question is that if there is the following pattern:

1 100 1
2 100 0
3 100 0

which row do you want to delete, row 2 or row 3?

And what about the following:

1 100 0
2 100 1
3 100 0

which row should be deleted, row 1 or row 3?

And also for:

1 100 0
2 100 0
3 100 1

which row should be deleted, row 1 or row 2?

(8) By Keith Medcalf (kmedcalf) on 2020-10-22 00:56:08 in reply to 7 [link] [source]

Or perhaps the following trigger:

create table t
(
    ID      integer primary key,
    GroupID integer not null,
    Ack     integer not null check (Ack in (0,1))
);
create unique index t_Deleteable_ID on t (ID, GroupID) where Ack != 1;
create index t_GroupID on t (GroupID, ID, Ack);

create trigger t_bi_prune before insert on t
begin
  delete from t
   where ID = (
                 select ID
                   from t as o
                  where (select count(ID) from t where GroupID == o.GroupID and Ack != 1) > 1
                    and (select Ack from t where GroupID == o.GroupID and ID < o.ID order by ID desc limit 1) IS NOT 1
                    and Ack != 1
               order by ID
                  limit 1
              )
     and (
          select count(ID)
            from t
         ) >= 6;
end;

Which does the following:

for each ID in ascending order where Ack != 1
where there are more than 2 rows with Ack != 1 in the same GroupID
and the Ack of the previous row in the group is not 1
when you have found 1 ID and there are 6 or more rows in t delete that ID

ie, the answers to the questions are 3 1 1 and for the following none would be deleted:

1 100 1
2 100 0
3 100 1

1 100 1
2 100 0
3 100 1
4 100 0

(6) By Keith Medcalf (kmedcalf) on 2020-10-21 22:29:37 in reply to 3 [link] [source]

If you are always providing the ID (that is do not depend on it being the RowID), then the following definition will be slightly more efficient:

create table t
(
    ID      integer primary key,
    GroupID integer not null,
    Ack     integer not null check (Ack in (0,1))
) without rowid;
create unique index t_Deleteable_ID on t (ID, GroupID) where Ack != 1;
create unique index t_GroupID on t (GroupID, ID);

Basically it works because the index t_deleteable_id is an index containing all the IDs which may be deleted together with "covering" the GroupID so no access to the underlying table is required. The "select ID from t where ack != 1 order by ID" simply goes through this index only and then for each eligible ID uses the GroupID from the index to "select count(ID) from t" for that GroupID and eliminates if the group does not have more than one ID. The process stops looking when the first passing ID is found (limit 1).

You could speed up the whole thing (again at the expense of disk space) if you kept track of all the information you need to manage the lists with other triggers and tables. For example:

create table t
(
    ID      integer primary key,
    GroupID integer not null,
    Ack     integer not null check (Ack in (0,1))
) without rowid;

create table t_group_count
(
    GroupID integer not null primary key,
    Count   integer not null
) without rowid;

create table t_count
(
    Rows    integer not null default 1
);
insert into t_count values (0);

create unique index t_Deleteable_ID on t (ID, GroupID) where Ack != 1;

create trigger t_insert_count after insert on t
begin
    insert into t_group_count values (new.GroupID, 1) on conflict (GroupID) do update set Count = Count + excluded.Count;
    update t_count set Rows = Rows + 1;
end;

create trigger t_delete_count after delete on t
begin
    insert into t_group_count values (old.GroupID, 1) on conflict (GroupID) do update set Count = Count - excluded.Count;
    update t_count set Rows = Rows - 1;
end;

create trigger t_bi_prune before insert on t
begin
  delete from t
   where ID = (
                 select ID
                   from t as o
                  where (select Count from t_group_count where GroupID == o.GroupID) > 1
                    and Ack != 1
               order by ID
                  limit 1
              )
     and (
          select Rows
            from t_count
         ) >= 6;
end;

insert into t values (1, 100, 0), (2, 200, 1), (3, 200, 0), (4, 300, 0), (5, 300, 0), (6, 200, 0), (7, 200, 0);

Which uses a couple of additional tables so that the counts are updated at insert/delete time eliminating the overhead of counting all the rows in the group or the table t ...

(9) By Heribert (derhexer) on 2020-10-22 08:56:13 in reply to 6 [link] [source]

You are right. There are some conditions that I have not considered...

A row is not the "oldest row", if it is the last inserted row of a group (its the current state of the group).

General: Do not delete rows with Ack=1 and do not delete the current state of a group (last inserted row of a group).

The question is that if there is the following pattern:

1 100 1
2 100 0
3 100 0

*To delete: ID=2 (is not the current state of the group)
After delete and insert: 
*No one could be deleted, if the new inserted row has a different GroupID

And what about the following:

1 100 0
2 100 1
3 100 0

which row should be deleted, row 1 or row 3?

*To delete: ID=1 (is not the current state of the group) 
After delete and insert: 
*No one could be deleted, if the new inserted row has a different GroupID

And also for:

1 100 0
2 100 0
3 100 1

which row should be deleted, row 1 or row 2?

*To delete: ID=1 (is not the current state of the group)
After delete and insert: 
*To delete: ID=2 (is not the current state of the group)
After delete and insert: 
*No one could be deleted, if the new inserted row has a different GroupID


I think i solved it. 
Based on the helpful implementations of Keith:
I added an exclude of the "current group state" rows.

create trigger t_bi_prune before insert on t
begin
  delete from t
    where ID = (
      select ID from t as o  
      where 
       (select count(ID) from t where GroupID  == o.GroupID ) > 1 
	and Ack != 1
	except 
	select max(ID) from t group by GroupID LIMIT 1);
end

Big THX to Keith!

(10) By Heribert (derhexer) on 2020-10-22 09:33:31 in reply to 9 [link] [source]

My solution works, but its really slow.

create trigger t_bi_prune before insert on t
begin
  delete from t
    where ID = (
      select ID from t as o  
      where 
       (select count(ID) from t where GroupID  == o.GroupID ) > 1 
	and Ack != 1
	except 
	select max(ID) from t group by GroupID LIMIT 1);
end

4000 rows - insert one row : needs more than 1.3 sec

I'm using following index
create unique index t_GroupID on t (GroupID, ID);

The query planer shows: the index is used.

I think max() with "group by" eats a lot of time.

Is there any way to speed this up?

(11) By Heribert (derhexer) on 2020-10-22 10:09:43 in reply to 10 [link] [source]

I found a faster solution for "select max(ID) from t group by GroupID LIMIT 1);".


create trigger t_bi_prune before insert on t
begin
  delete from t
    where ID = (
      select ID from t as o  
      where 
       (select count(ID) from t where GroupID  == o.GroupID ) > 1 
	and Ack != 1
	except 
	select ID from t as o  
        where ID = (select ID from t where GroupID  = o.GroupID  order by ID desc limit 1) order by ID limit 1;
end

4000 rows - insert one row : needs around 0.002 sec

(12) By Keith Medcalf (kmedcalf) on 2020-10-22 10:18:37 in reply to 10 [link] [source]

It takes a long time because you are using EXCEPT. This means that the select preceding the EXCEPT must be executed in its entirety and the select after the except must be run in its entirety, the EXCEPT applied, and then the LIMIT 1 is applied.

If you want to exclude the max(ID) in each group then you need to execute the check for each candidate as it arises so that you can stop looking as soon as a valid ID is found rather than processing the entire query then only using the first result.

The difference is between examining the entire deck of cards to find the ones eligible for disposal and then using the first one in the resulting list compared to determining which one to discard by looking at each in turn and stopping (and discarding) as soon as you find the first one to discard.

create trigger t_bi_prune before insert on t
begin
  delete from t
    where ID = (
                   select ID
                     from t as o
                    where (select count(ID) from t where GroupID  == o.GroupID ) > 1
                      and (select ID from t where GroupID == o.GroupID and ID > o.ID order by ID limit 1) IS NOT NULL
                      and Ack != 1
                 order by ID
                    LIMIT 1
               );
end;

(13) By Keith Medcalf (kmedcalf) on 2020-10-22 10:33:13 in reply to 12 [link] [source]

(select ID from t where GroupID == o.GroupID and ID > o.ID order by ID limit 1) IS NOT NULL

is the same as

exists (select ID from t where GroupID == o.GroupID and ID > o.ID order by ID limit 1)

(14) By Heribert (derhexer) on 2020-10-22 11:30:31 in reply to 13 [link] [source]

Thx Keith.

Your solution is the fastest...

4000 rows - insert one row : needs 0.001 sec (maybe lesser :) )

(15) By anonymous on 2020-10-22 12:35:14 in reply to 13 [link] [source]

I'd expect exists to short-circuit, so the inner limit 1 shouldn't be necessary.

And the order by ID shouldn't be necessary either.

And finally, I tend to use select 1 ... instead of any real column.

So just exists (select 1 from t where GroupID = o.GroupID and ID > o.ID)

(17.1) By Keith Medcalf (kmedcalf) on 2020-10-22 20:44:19 edited from 17.0 in reply to 15 [link] [source]

Except that this is dependent on presumptions about ordering and availability of indexes and the preference for their use, so rather than "work exactly as intended, just very slowly" if the assumptions are not met you will get some other inscrutable result instead which may or may not implement your intention.

In the end if you use the more explicit spelling you will get precisely and exactly the same code generated when the required presumptions of the LESS EXPLICIT spelling and the less explicit spelling is used.

Also, with the more explicit spelling, you will not have the problem of someone (perhaps even you yourself) trying to remember the "meaning" tomorrow or next week or next year or a decade hence.

Plus, of course, if someone changes the definition of the indexes or adds additional indexes the explicit spelling will explicitly do exactly what you spelt, whereas the less explicit spelling is dependent for its correct operation on conditions which may or may not remain invariant.

This specifically refers to the use of an ORDER BY clause. Eliding the LIMIT 1 or what is selected -- and using a constant -- should not make any difference.

For example, the "LIMIT 1" is "transparently added" by SQLite3. This is, however, reliance on an "implementation detail". Even though it is unlikely to change, it is not portable and someone familiar with a different implementation may be "confused" as to why it works and an error is not thrown.

(19) By Keith Medcalf (kmedcalf) on 2020-10-23 23:07:00 in reply to 12 [link] [source]

On subsequent examination one can only conclude that the requirement for the entry to be deleted is "not the last one" implies that there exists more than one entry in the group and hence the two correlated subqueries are redundant and can be optimized by getting rid of the useless check for more than one member of the group.

create trigger t_bi_prune before insert on t
begin
  delete from t
    where ID = (
                   select ID
                     from t as o
                    where exists (select * from t where GroupID == o.GroupID and ID > o.ID order by ID)
                      and Ack != 1
                 order by ID
                    limit 1
               );
end;

This should make the trigger execute even faster.

(16) By doug (doug9forester) on 2020-10-22 19:17:47 in reply to 9 [source]

I've been watching this thread and thought I would offer an observation from the cheap seats. You seem to be shot-gunning your approach to this problem. Whenever I do that, I leave holes in unexpected places.

I would suggest that you devise a set of test cases, especially paying attention to the boundary conditions (none, one, up to 7 if you are using hardcoded limits like 6) and the boundary values (less than, duplicates, greater than). Wouldn't take much effort to put a pretty exhaustive set of tests together and would pay bundles of time and energy down the pike.

My $0.02. Doug

(18) By skiingyac on 2020-10-22 20:51:56 in reply to 1 [link] [source]

While SQL can do a lot of things, it may be simpler to do it in code. Do you need a fixed size, or just an "approximately fixed" size? Lets say you want 100k rows, and are willing to have the count fluctuate between 100k and 101k rows.

In whatever function inserts into this table, keep a static temp_count variable that counts the # of inserts since the last purge. Once temp_count exceeds 1000, run a DELETE statement in the same transaction with your desired filter and order logic and then clear temp_count when you commit. You're guaranteed that the size will never exceed 101k. This allows you to amortize the delete statement unless that unevenness in the speed of the operations is an issue.