history keeping

(1) By ingo on 2020-08-12 11:15:19 [link] [source]

While looking at keeping history for some tables one method often mentioned goes as follows "If data is modified or deleted, the original record is not updated; instead a new record is created that holds the latest information. The new (most recent) record becomes the live record (active), while all previous records form the history."

Is there a reason to not do the following? "If data is modified, a new record is created that holds the old information and references the original record. The original record is updated; The new (most recent) record becomes the historic record. Timestamps are set accordingly. If a record is deleted only the timestamps are set (soft delete).

The reason, ID's stay the same for the active record as they are referenced in several places. ID's can be kept unique.

(2) By Clemens Ladisch (cladisch) on 2020-08-12 11:54:40 in reply to 1 [link] [source]

You have multiple rows/versions of the same record in a table, so it is not possible to use the autoincrementing rowid.

And you want to be able to do queries on old versions of a record, so the actual primary key will be something like (ID, timestamp). So the ID values themselves do not change.

(3) By Richard Damon (RichardDamon) on 2020-08-12 12:30:22 in reply to 1 [link] [source]

As Clemens said, the typical use is that you want to be able to effectively roll back to a given point in time. Thus references to the record aren't to a specific rowid identifier but a sort of unique identifier (sort of unique, as it represents the record over all time, so more than one row has that id)

What I often do is add two fields to each record, one being the 'creation' stamp for this record, for when this version was created, and a second stamp for when it was deleted or updated. This allows joins to directly specify which historic record to find, or use a null (or infinite future depending on what the value used for the second time stamp for current is),

If the current record uses some unique rowid as a key, how do you mark the historical record that matches it?

Now there is an alternate method, similar to what you describe, where you set up a second history table, and you move the historical records to it, adding versioning information. This keeps the 'main' tables smaller and faster, but doesn't allow as easy of querying the history of a record,

(4) By Ryan Smith (cuz) on 2020-08-12 15:01:49 in reply to 1 [source]

We've lately been favouring a different scheme for tables-with-history with some advantages and disadvantages.

Basic Idea:

Have one table A (say "contacts") with the usual auto-inc primary ID (RowID alias in the case of SQLite), or any other defined primary key,
then a second table H of the form "history_xxx" ("history_contacts" for example) with it's own row_id or Primary Key
and one extra field for the original row-id/PK in A (but non-Unique in H).
The rest of the schema matches exactly table A.

An ON UPDATE Trigger on table A writes, whatever the current (before-update) row values (OLD.xxx) are, to the H table.

So at any point in time we have Table A with the current-most values, and Table H with the values as it were at that point in time.

Advantages:

Table A doesn't grow with loads of history, remains fast.
Table A doesn't need a combo-PK to account for Time - the base PK works just fine and doesn't change (especially useful for FK relations).
Table H can be joined to Table A to see what changed at any point till now.
We have automated code that can build the History table from any given table, and to update a History table schema, which makes things easier than having to plan how to use a single table with history.
We can thus run this automated code and "historify" even any old project in a jiffy where we didn't have history before, because we do not touch the original tables (excepting adding the ON UPDATE trigger).

Disadvantages:

There are now 2 tables and not just 1 for every table where history is required.
An update to table A's schema has to always be accompanied with an update to H's schema - even if we have an automated way of doing it, it requires at least some testing. (Depending on how often this happens, this is a strong consideration).

Note: Our tables ALWAYS contain "UpdateTime" fields which then works as-is in the history table and defines the time up-to-which the data in that row persisted in the live table A, but if you don't have that, the History table will need some timestamp column added to do that job.

Hope that offers another option in your quest.

Cheers, Ryan

(5) By ingo on 2020-08-12 15:21:11 in reply to 3 [link] [source]

By the answers I'm not sure whether my first post was all that clear.

Second attempt; The example below may not be flawless, just to show what I was thinking. The 'old' row is copied to a new test_id and references the old_test_id that is also in use in other tables. Now the 'old' row can be updated with new content. Query for an "id" in historic gives all previous versions, sort by ts_eol. For current version create an index (test_id, ts_eol is Null)

CREATE TABLE test(
    test_id INTEGER PRIMARY KEY NOT NULL
    content TEXT NOT NULL
    historic INTEGER DEFAULT NULL REFERENCES test(test_id)
    ts_from TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
    ts_eol TEXT DEFAULT NULL,
);

CREATE TRIGGER IF NOT EXISTS update_test
BEFORE UPDATE OF content ON test
BEGIN  
    INSERT INTO test (content, historic, ts_from, ts_eol)
    SELECT old.content, old.test_id, old.ts_from, CURRENT_TIMESTAMP
    ;
END
;

(6) By ingo on 2020-08-12 15:33:59 in reply to 4 [link] [source]

'Ve been looking at that too Ryan. Speed is no issue here, nor reverting to a previous situation. If that needs to be done then manually so that it shows up in the time line, with proper comments. It will be used for tracking changes in processes and changes of state in processes. For example in an order management system, the change from offer to order to invoice. Or the change of specification of a batch or batchnumber of raw materials. Kind of mini 'time lines'.

(7) By Ryan Smith (cuz) on 2020-08-12 17:58:48 in reply to 5 [link] [source]

Not sure what the question is here?

Will it work? - Yes.

Is it best? - That depends really on how you will use (query) the table in future.

Essentially your method makes the original row the CURRENT row, and any previous historic data are kept in newly added rows. The current row's "historic" field presumably stays NULL. Seems fine.

Elsewhere I proposed a method for doing this but with historic rows going to a new table. This method of yours achieves the very same, also gaining the advantage of FK relations remaining good while avoiding the disadvantage listed for my method of now having two tables. You do miss out on the size-speed advantage, but from what you've said, that is not a problem at all in your case.

So yes, I would say (keeping in mind I'm not intimately familiar with your system architecture or intended history queries) that your method is not only workable, but plausibly the best one for your use case.

Does that answer the question?

(8) By ingo on 2020-08-13 05:58:14 in reply to 7 [link] [source]

That answers my question. Thanks.