advice on SQLite DB structure
(1) By frank10 on 2022-11-01 08:26:39 [link] [source]
Hi all,
I am new to database designing...
I need to collect a list of books published on Amazon with some static attributes and other dynamic ones.
For the dynamic I need to store the date on which the change occurred.
Then I'll need to collect all info on each book in the actual date or in a range of dates, with all the eventual changes on the dynamic data.
For the static info I did a table:
Book (bookID (P), title, author, subtitle, author, type of product (e-book, pb, audiobook), isColor )
Dynamic info like these:
pages reviews + ratings per market internal ranking per market price (that changes also other parameters I need: royalty perceived and ads parameters)So, I need another table to put alla data changing on a book per market on a specific date.
But here I thought 2 approaches and I don't know which is better...
1) create several tables for each dynamic info, linked with FK:
Here I insert a row only if that specific data changed.
Page (actualDate, bookID FK, page) Review (actualDate, bookID FK, market, review) Rating (actualDate, bookID FK, market, ratings) Ranking (actualDate, bookID FK, market, ranking) Price (actualDate, bookID FK, market, price, royalty, ACOS)
2) collect all these info on a single table, apart Price like this:
BookData (actualDate, bookID FK, market, priceID FK, page, review, rating, ranking) Price (priceID, price, royalty, ACOS)Here I must insert a row even if one data only changed i.e. all info changed on a date: date = Oct 24, BookID = 35, price = 13.90, royalty = 5.01, ACOS = 43, page = 245, review = 232, rating = 312, ranking = 23467
1)
Page --> Oct 24, 35, 245 Review --> Oct 24, 35, "com", 232 Rating --> Oct 24, 35, "com", 312 Ranking --> Oct 24, 35, "com", 23467 Price --> Oct 24, 35, "com", 1390, 501, 43
2)
BookData--> Oct 24, 35, "com", 16, 245, 232, 312, 23467 Price --> 16, 1390, 501, 43
So, my questions:
A) repetitions: for each book I have 4x3 more repetitions in 1): [24 Oct, 35, "com"] compared to 2) one time only.
B) In 2) if I have only one info changed, like ranking, I must insert a new row with null values on other fields? Like:
BookData--> Oct 25, 35, "com", null, null, null, 23467but when I need to retrieve all data on a specific date for bookID 35, how would I retrieve the previous info not changed on different dates?
C) In 1) could I put review and ratings in the same table because they are similar data...? Like: Review (actualDate, bookID FK, market, review, rating) But this way the query could be more complex to get actual review and rating situation because if I have review data, I have also ratings, instead if I have ratings I could not have review data, hence null So, on Oct 25 I could have review=232 and rating = 321 but info splitted in 2 date rows:
Review --> Oct 24, 35, "com", 232,312 Review --> Oct 25, 35, "com", null, 321In the end I would need all dynamic book info on a date or a range of dates. Which DB structure do you suggest for this case?
TIA
(2) By Gunter Hick (gunter_hick) on 2022-11-03 12:33:14 in reply to 1 [link] [source]
The schema depends on how far you want to go with normalizing. Your current schema does not allow more than one author per book (unless your duplicating the author field was intentional; note that field names have to be unique and adding numerals to field names suggests insufficient normalizing). Also, querying all books written by the same author will require an index. Your review and rating tables do not include an author, so you would be unable to attribute quotes. Also "market" and "type of product" beg to be limited in their values (i.e. defined in their own tables and replaced by a FK in the book table). For anything that changes over time, there are two approaches: You could have a table just line your own example, or you could simply have a history table where a row gets copied to every time an attribute changes. Which is better depends on wether you are mostly interested in current data or in retrieving data from a certain point in time.
(3) By frank10 on 2022-11-04 11:04:59 in reply to 2 [source]
Thank you Gunter.
Yes, author was duplicated by mistake... (Anyway I don't need to list eventually more than one author)
For review and rating I'm not interested in author: I only need to store how many there are and when they change.
Market & type: OK I'll create 2 more dedicated tables for them
Do you think could be a good idea creating also a Date table to avoid repeating a lot of time the same date (hundred times daily), using instead a FK of that specific date?
The part I'm more interested is exactly the last part of changes over time!
I could need extracting all dynamic data of books on the most recent date or on a paste date or all values changed in a range of dates.
i.e.
These could be all the tables: Book (bookID (P), title, author, subtitle, typeID (FK), isColor ) Type (typeID (P), type) ["K", "PB", "A"] Market (marketID (P), marketName) ["com", "co.uk", "it" etc] Date (dateID (P), actualDate) BookData (bookData (P), dateID (FK), bookID (FK), marketID (FK), priceID (FK), page, review, rating, ranking) ] Price (priceID (P), price, royalty, ACOS) and this example data: Book --> 35, "tit1", "aut1", "subtit1", 2, 0 Type --> 1, "K" Type --> 2, "PB" Market --> 1, "com" Date --> 1, 2022-10-24 Date --> 2, 2022-10-25 Date --> 3, 2022-10-28 Date --> 4, 2022-10-30 Price --> 16, 1390, 501, 43 Price --> 17, 1199, 432, 38So, using this schema to not waste space on DB file avoiding duplicated values:
BookData (bookData (P), dateID (FK), bookID (FK), marketID (FK), priceID (FK), page, review, rating, ranking) ] BookData --> 1, 1, 35, 1, 16, 245, 232, 312, 23467 BookData --> 2, 2, 35, 1, null, null, 233, 312, 57877 BookData --> 3, 3, 35, 1, 17, null, null, 314, 3432 BookData --> 4, 4, 35, 1, null, null, null, null, 8932How should be a query with these data example to retrieve dynamic info for this book on 2022-10-29? The result should be like this:
(I have no data for 29, I have 30 but I discard it because it's greater than 29.
So I take the situation at 28, and because it has some null values, I must take page and review not null from the nearest date prior to 28, that is 2022-10-25 )
I tried:
SELECT d.actualDate, b.title, p.price, m.market, bd.page, bd.review, bd.rating, bd.ranking FROM Book b, Market m, BookData bd inner JOIN Date d USING(dateID) inner JOIN Price p USING(priceID) WHERE b.bookID = (SELECT b.bookID From Book b WHERE b.title = "tit1") AND m.market = "com" AND d.actualDate <= "2022-10-29" I get: actualDate title price market page review rating ranking 2022-10-24 tit1 1390 com 245 232 312 23467 2022-10-28 tit1 1199 com 314 3432So, 2 problems:
- This has an error because Price didn't change on 2022-10-25, BUT review became 233. And this is a lost value...
2) More, is there a way to get a single line result directly from SQL like this:title, market, price, page, review, rating, ranking "tit1", "com", 1199, 245, 233, 314, 3432
or should I elaborate the multiline SQL result with my program?
(4) By anonymous on 2022-11-15 20:52:52 in reply to 3 [link] [source]
Hi,
Date
Do you think could be a good idea creating also a Date table to avoid repeating a lot of time the same date (hundred times daily), using instead a FK of that specific date?
I wouldn't do it for the date. If you want to save space for the date (not store the date in 10 characters), just use the Julian date or the unixepoch date. See the date and time function of sqlite. You would then ensure that all your date are easily sortable.
Fix
In your query, there is one conceptual mistake: BookData should be joined with Book using bookID and with Marked using marketID.
Then:
WHERE b.bookID = (SELECT b.bookID From Book b WHERE b.title = "tit1")
could be simplified into
WHERE b.title = 'tit1'
View of dynamic results
How should be a query with these data example to retrieve dynamic info for this book on 2022-10-29?
I would suggest to create a view where, for each date, bookID, and marketID retrieve the information from the last records regarding each columns of interest (review, priceID, and rating).
Which should give:
CREATE VIEW vBookData AS
SELECT bookData, date, bookID, marketID,
(SELECT bd0.page FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.page IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS page,
(SELECT bd0.review FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.review IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS review,
(SELECT bd0.priceID FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.priceID IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS priceID,
(SELECT bd0.rating FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.rating IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS rating
FROM bookData AS bd;
Then, to answer your question, you can simply do:
select
b.title, m.marketName,
p.price,
v.page, v.review, v.rating, v.ranking
from vBookData as v join Book AS b USING(bookID) JOIN Market AS m USING(marketID) JOIN Price AS p USING(priceID) where date = julianday('2022-10-28 12:00');
Which returns
title | marketName | price | page | review | rating | ranking |
---|---|---|---|---|---|---|
tit1 | com | 1199 | 245 | 233 | 314 | 3432 |
I put you below the full dump of my DB if you want to reproduce. Best regards, Christian.
SQL dump
CREATE TABLE BookData (bookData INTEGER PRIMARY KEY, date INTEGER, bookID INTEGER, marketID INTEGER, priceID INTEGER, page INTEGER, review INTEGER, rating INTEGER, ranking INTEGER);
INSERT INTO BookData VALUES(1,2459877,35,1,16,245,232,312,23467);
INSERT INTO BookData VALUES(2,2459878,35,1,NULL,NULL,233,312,57877);
INSERT INTO BookData VALUES(3,2459881,35,1,17,NULL,NULL,314,3432);
INSERT INTO BookData VALUES(4,2459883,35,1,NULL,NULL,NULL,NULL,8932);
CREATE TABLE Book (bookID INTEGER PRIMARY KEY, title TEXT, author TEXT, subtitle TEXT, typeID INTEGER, isColor INTEGER);
INSERT INTO Book VALUES(35,'tit1','auth1','subtit1',2,0);
CREATE TABLE IF NOT EXISTS "Type" (typeID INTEGER PRIMARY KEY, type TEXT);
INSERT INTO Type VALUES(1,'K');
INSERT INTO Type VALUES(2,'PB');
INSERT INTO Type VALUES(3,'A');
CREATE TABLE Market (marketID INTEGER PRIMARY KEY, marketName TEXT);
INSERT INTO Market VALUES(1,'com');
INSERT INTO Market VALUES(2,'co.uk');
INSERT INTO Market VALUES(3,'it');
CREATE TABLE price (priceID INTEGER PRIMARY KEY, price INTEGER, royalty INTEGER, ACOS INTEGER);
INSERT INTO price VALUES(16,390,501,43);
INSERT INTO price VALUES(17,1199,432,38);
CREATE VIEW vBookData AS
SELECT bookData, date, bookID, marketID,
(SELECT bd0.page FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.page IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS page,
(SELECT bd0.review FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.review IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS review,
(SELECT bd0.priceID FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.priceID IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS priceID,
(SELECT bd0.rating FROM bookData AS bd0 WHERE (bd0.bookID, bd0.marketID) = (bd.bookID, bd.marketID) AND bd0.date <= bd.date AND bd0.rating IS NOT NULL ORDER by bd0.date DESC LIMIT 1) AS rating
FROM bookData AS bd;
CREATE INDEX book_maket_date on BookData(bookID, marketID, date);
(5) By frank10 on 2022-11-17 08:37:11 in reply to 4 [link] [source]
Thank you Christian. You put me on the right track with view!