SQLite User Forum

advice on SQLite DB structure
Login

advice on SQLite DB structure

(1) By frank10 on 2022-11-01 08:26:39 [link] [source]

Hi all, I am new to database designing...
I need to collect a list of books published on Amazon with some static attributes and other dynamic ones.
For the dynamic I need to store the date on which the change occurred.
Then I'll need to collect all info on each book in the actual date or in a range of dates, with all the eventual changes on the dynamic data.
For the static info I did a table:

    Book (bookID (P), title, author, subtitle, author, type of product (e-book, pb, audiobook), isColor )

Dynamic info like these:

    pages
    reviews + ratings  per market
    internal ranking per market
    price (that changes also other parameters I need: royalty perceived and ads parameters)
So, I need another table to put alla data changing on a book per market on a specific date.

But here I thought 2 approaches and I don't know which is better...

1) create several tables for each dynamic info, linked with FK:

   
Page (actualDate, bookID FK, page) Review (actualDate, bookID FK, market, review) Rating (actualDate, bookID FK, market, ratings) Ranking (actualDate, bookID FK, market, ranking) Price (actualDate, bookID FK, market, price, royalty, ACOS)
Here I insert a row only if that specific data changed.


2) collect all these info on a single table, apart Price like this:

    BookData (actualDate, bookID FK, market, priceID FK, page, review, rating, ranking) 
    Price    (priceID, price, royalty, ACOS)
Here I must insert a row even if one data only changed

i.e. all info changed on a date: date = Oct 24, BookID = 35, price = 13.90, royalty = 5.01, ACOS = 43, page = 245, review = 232, rating = 312, ranking = 23467
1)
    Page    -->     Oct 24,     35,         245
    Review  -->     Oct 24,     35,         "com",  232
    Rating  -->     Oct 24,     35,         "com",  312
    Ranking -->     Oct 24,     35,         "com",  23467
    Price   -->     Oct 24,     35,         "com",  1390,       501,    43

2)
    BookData-->     Oct 24,     35,         "com",  16,         245,     232,   312,   23467
    Price   -->     16,         1390,       501,    43

So, my questions:

A) repetitions: for each book I have 4x3 more repetitions in 1): [24 Oct, 35, "com"] compared to 2) one time only.

B) In 2) if I have only one info changed, like ranking, I must insert a new row with null values on other fields? Like:
    BookData-->     Oct 25,     35,         "com",  null,         null,     null,    23467
but when I need to retrieve all data on a specific date for bookID 35, how would I retrieve the previous info not changed on different dates?

C) In 1) could I put review and ratings in the same table because they are similar data...? Like: Review (actualDate, bookID FK, market, review, rating) But this way the query could be more complex to get actual review and rating situation because if I have review data, I have also ratings, instead if I have ratings I could not have review data, hence null So, on Oct 25 I could have review=232 and rating = 321 but info splitted in 2 date rows:
    Review  -->     Oct 24,     35,         "com",  232,312
    Review  -->     Oct 25,     35,         "com",  null, 321
In the end I would need all dynamic book info on a date or a range of dates. Which DB structure do you suggest for this case?
TIA

(2) By Gunter Hick (gunter_hick) on 2022-11-03 12:33:14 in reply to 1 [link] [source]

The schema depends on how far you want to go with normalizing.

Your current schema does not allow more than one author per book (unless your duplicating the author field was intentional; note that field names have to be unique and adding numerals to field names suggests insufficient normalizing). 

Also, querying all books written by the same author will require an index.

Your review and rating tables do not include an author, so you would be unable to attribute quotes.

Also "market" and "type of product" beg to be limited in their values (i.e. defined in their own tables and replaced by a FK in the book table).

For anything that changes over time, there are two approaches:

You could have a table just line your own example, or you could simply have a history table where a row gets copied to every time an attribute changes. Which is better depends on wether you are mostly interested in current data or in retrieving data from a certain point in time.

(3) By frank10 on 2022-11-04 11:04:59 in reply to 2 [source]

Thank you Gunter. Yes, author was duplicated by mistake... (Anyway I don't need to list eventually more than one author)
For review and rating I'm not interested in author: I only need to store how many there are and when they change.
Market & type: OK I'll create 2 more dedicated tables for them
Do you think could be a good idea creating also a Date table to avoid repeating a lot of time the same date (hundred times daily), using instead a FK of that specific date?


The part I'm more interested is exactly the last part of changes over time! I could need extracting all dynamic data of books on the most recent date or on a paste date or all values changed in a range of dates.

i.e.

These could be all the tables:
Book (bookID (P), title, author, subtitle, typeID (FK), isColor )
Type (typeID (P), type) ["K", "PB", "A"]
Market (marketID (P), marketName) ["com", "co.uk", "it" etc]
Date (dateID (P), actualDate)
BookData (bookData (P), dateID (FK), bookID (FK), marketID (FK), priceID (FK), page, review, rating, ranking) ]
Price    (priceID (P), price, royalty, ACOS)
and this example data:
Book        -->     35, "tit1", "aut1", "subtit1", 2, 0
Type        -->     1,  "K"
Type        -->     2,  "PB"
Market      -->     1,  "com"
Date        -->     1, 2022-10-24
Date        -->     2, 2022-10-25
Date        -->     3, 2022-10-28
Date        -->     4, 2022-10-30
Price       -->     16,         1390,       501,    43
Price       -->     17,         1199,       432,    38
So, using this schema to not waste space on DB file avoiding duplicated values:
BookData        (bookData (P), dateID (FK), bookID (FK), marketID (FK), priceID (FK), page, review, rating, ranking) ]
BookData    -->     1,          1,          35,             1,              16,       245,  232,    312,    23467
BookData    -->     2,          2,          35,             1,              null,     null, 233,    312,   57877
BookData    -->     3,          3,          35,             1,              17,       null, null,   314,    3432
BookData    -->     4,          4,          35,             1,              null,     null, null,   null,   8932
How should be a query with these data example to retrieve dynamic info for this book on 2022-10-29? The result should be like this:
(I have no data for 29, I have 30 but I discard it because it's greater than 29.
So I take the situation at 28, and because it has some null values, I must take page and review not null from the nearest date prior to 28, that is 2022-10-25 )

I tried:

SELECT d.actualDate, b.title, p.price, m.market, bd.page, bd.review, bd.rating, bd.ranking 
FROM Book b, Market m, BookData bd
inner JOIN Date d USING(dateID)
inner JOIN Price p USING(priceID)
WHERE b.bookID = (SELECT b.bookID From Book b WHERE b.title = "tit1") 
AND m.market = "com" AND d.actualDate <= "2022-10-29"
I get:
actualDate	title	price	market	page	review	rating	ranking
2022-10-24	tit1    1390	com	    245	    232	    312	    23467
2022-10-28	tit1    1199	com			            314	    3432
So, 2 problems:

  1. This has an error because Price didn't change on 2022-10-25, BUT review became 233. And this is a lost value...
    2) More, is there a way to get a single line result directly from SQL like this:
    title, market, price, page, review, rating, ranking
    "tit1", "com", 1199,    245, 233,   314,    3432
    
    or should I elaborate the multiline SQL result with my program?

(4) By anonymous on 2022-11-15 20:52:52 in reply to 3 [link] [source]

Hi,

Date

Do you think could be a good idea creating also a Date table to avoid repeating a lot of time the same date (hundred times daily), using instead a FK of that specific date?

I wouldn't do it for the date. If you want to save space for the date (not store the date in 10 characters), just use the Julian date or the unixepoch date. See the date and time function of sqlite. You would then ensure that all your date are easily sortable.

Fix

In your query, there is one conceptual mistake: BookData should be joined with Book using bookID and with Marked using marketID.

Then:

    WHERE b.bookID = (SELECT b.bookID From Book b WHERE b.title = "tit1") 

could be simplified into

    WHERE b.title = 'tit1'

View of dynamic results

How should be a query with these data example to retrieve dynamic info for this book on 2022-10-29?

I would suggest to create a view where, for each date, bookID, and marketID retrieve the information from the last records regarding each columns of interest (review, priceID, and rating).

Which should give:

CREATE VIEW vBookData AS
SELECT bookData, date, bookID, marketID,
    (SELECT bd0.page FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.page IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS page,
    (SELECT bd0.review FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.review IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS review,
    (SELECT bd0.priceID FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.priceID IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS priceID,
    (SELECT bd0.rating FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.rating IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS rating
FROM bookData AS bd;

Then, to answer your question, you can simply do:

select 
    b.title, m.marketName, 
    p.price, 
    v.page, v.review, v.rating, v.ranking 
from vBookData as v join Book AS b USING(bookID) JOIN Market AS m USING(marketID) JOIN Price AS p USING(priceID) where date = julianday('2022-10-28 12:00');

Which returns

title marketName price page review rating ranking
tit1 com 1199 245 233 314 3432

I put you below the full dump of my DB if you want to reproduce. Best regards, Christian.

SQL dump

CREATE TABLE BookData (bookData INTEGER PRIMARY KEY, date INTEGER, bookID INTEGER, marketID INTEGER, priceID INTEGER, page INTEGER, review INTEGER, rating INTEGER, ranking INTEGER);
INSERT INTO BookData VALUES(1,2459877,35,1,16,245,232,312,23467);
INSERT INTO BookData VALUES(2,2459878,35,1,NULL,NULL,233,312,57877);
INSERT INTO BookData VALUES(3,2459881,35,1,17,NULL,NULL,314,3432);
INSERT INTO BookData VALUES(4,2459883,35,1,NULL,NULL,NULL,NULL,8932);
CREATE TABLE Book (bookID INTEGER PRIMARY KEY, title TEXT, author TEXT, subtitle TEXT, typeID INTEGER, isColor INTEGER);
INSERT INTO Book VALUES(35,'tit1','auth1','subtit1',2,0);
CREATE TABLE IF NOT EXISTS "Type" (typeID INTEGER PRIMARY KEY, type TEXT);
INSERT INTO Type VALUES(1,'K');
INSERT INTO Type VALUES(2,'PB');
INSERT INTO Type VALUES(3,'A');
CREATE TABLE Market (marketID INTEGER PRIMARY KEY, marketName TEXT);
INSERT INTO Market VALUES(1,'com');
INSERT INTO Market VALUES(2,'co.uk');
INSERT INTO Market VALUES(3,'it');
CREATE TABLE price (priceID INTEGER PRIMARY KEY, price INTEGER, royalty INTEGER, ACOS INTEGER);
INSERT INTO price VALUES(16,390,501,43);
INSERT INTO price VALUES(17,1199,432,38);
CREATE VIEW vBookData AS
SELECT bookData, date, bookID, marketID,
    (SELECT bd0.page FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.page IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS page,
    (SELECT bd0.review FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.review IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS review,
    (SELECT bd0.priceID FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.priceID IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS priceID,
    (SELECT bd0.rating FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.rating IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS rating
FROM bookData AS bd;
CREATE INDEX book_maket_date on BookData(bookID, marketID, date);

(5) By frank10 on 2022-11-17 08:37:11 in reply to 4 [link] [source]

Thank you Christian.
You put me on the right track with view!