Keeping the index and delete the table

(1) By carlbauh on 2022-11-22 17:19:20 [link] [source]

Hello,

Yes I know the title is weird, but I try to explain: I have a table with more 100 million records like: CREATE TABLE "table"(value INT); So it has two columns, one ROWID and one value.

I want to speedup query like: SELECT * FROM table WHERE value=1312220;

So I made an index like: CREATE INDEX "indx" ON "table" ("value");

and my query is incredibly faster. The only flaw is that the size of the DB is now duplicated. As this table is read only, is there a way to only have the index to speedup or somehow tell the DB that it has the "value" as ordered so it can query it like an Index?

(2) By Richard Damon (RichardDamon) on 2022-11-22 17:31:02 in reply to 1 [link] [source]

Try:

CREATE TABLE "table" (value INTEGER PRIMARY KEY);

so the value IS the primary key.

The one limitation is that you won't be able to put duplicate, null, or non-integer values into the table.

(3) By carlbauh on 2022-11-22 17:38:48 in reply to 2 [link] [source]

Exactly my problem, it has duplicates and that is why I didn't do what you've said.

Any other way?

(4) By Richard Damon (RichardDamon) on 2022-11-22 18:29:56 in reply to 3 [link] [source]

If you have duplicates, do you just need to know how many? if so rather than having duplicates have.a field to say how many.

If you have other date attached to the record that might be different, then you need the index, but it won't fully double the size of your database, as that other data won't be duplicated.

(6) By anonymous on 2022-11-22 18:54:02 in reply to 4 [link] [source]

Agree, yet, unless the combination of value, date is unique, it will reduce to the same problem (will ultimately still need some sort of unique key on the table).

If the combination of columns is indeed unique, then that can be substituted for ROWID (i.e. he can use a compound unique primary key and WITHOUT ROWID) if minimum size is paramount importance.

If the count is what is important than perhaps an INTEGER PRIMARY KEY (ROWID uniqueness), and an insert/trigger to bump the count when appropriate.

My sense is the difficulty encountered by OP has exposed the need for extra thought on the data structure.

(7) By Richard Damon (RichardDamon) on 2022-11-22 19:10:48 in reply to 6 [link] [source]

The only way to replace the need for a separate index is for their to be a single integer value that is unique for every row, and if that column is defined exactly as INTEGER PRIMARY KEY, it will become the row identifier and not need an index.

Any other case, and SQLite will create a hidden row-id column to be the actual PRIMARY KEY of the database, and any other column (or column set) will become a UNIQUE index.

Alternatively, you can define the table to be "WITHOUT ROWID" and that primary key as the built-in index for the table, but if you need any other indexes on the table, they will get bigger as they will contain full copies of the primary key for each record.

And yes, it does sound like some thought about what is actually needed, or perhaps more is known but not put into the question.

(8) By anonymous on 2022-11-22 20:10:32 in reply to 7 [link] [source]

one small technical clarification:

Your description might be misinterpreted as meaning that anything other than an INTEGER PRIMARY KEY will duplicate index data (not exactly true for ALL cases, see below).

Referencing the example from https://www.sqlite.org/withoutrowid.html :
----------
...Because there is only a single B-Tree, the text of the "word" column is only stored once in the database. Furthermore, querying the "cnt" value for a specific "word" only involves a single binary search into the main B-Tree, since the "cnt" value can be retrieved directly from the record found by that first search and without the need to do a second binary search on the rowid.
-------------
This is why a WITHOUT ROWID table must have a UNIQUE PRIMARY KEY (even if it's a compound key).  The B-Tree requires something to traverse.

Expanding the OP question a bit:

There will be other characteristics (such as insert/delete/update performance ) in  having longer unique primary key (as noted in the reference docs cited).  Given that the OP data structure requires a bit more thought, they would be well advised to consider population and maintenance in their review as well.

(5) By anonymous on 2022-11-22 18:37:52 in reply to 3 [link] [source]

As explained in the docs here (https://www.sqlite.org/withoutrowid.html), you will need some sort of unique key on a table, whether it's ROWID, or your own unique primary key.

create table t(value int) strict; -- using strict to make sure we keep value an int
create index value_dups_on_t on t (value);
insert into t values (1), (2), (1);
select * from t;
┌───────┐
│ value │
├───────┤
│ 1     │
│ 2     │
│ 1     │
└───────┘

That creates about the smallest non unique integer values in a table structure for sqlite I am aware of for a unary relation with duplicates.

It's not clear of it's usefulness though, as a sorted list in memory would allow  a binary search without the overhead of sql, so if this is disconnected from any other data (a unary relation) it's a bit curious as to why duplicates are necessary.

If one had meaningful duplicates, how would you know if it's the first, or second, or nth value? and if that does not matter, why have duplicates (and then you can use value as integer primary key again, not to mention a small table)?

If this table is of the unary relation type (just looks up whether it's in the table) perhaps look at the insert operation of the table and filter out duplicates at that point?

Without knowing the rest of the parts it is a bit difficult to guess which way you should go with this.

(9) By carlbauh on 2022-11-23 08:32:46 in reply to 5 [link] [source]

I have a  "value", it is not unique, but it is the value that I do equal query and joins on it, a lot.
I know SQLite needs a unique key for its internal structure and it is no problem, ROWID was hidden in the table and that is okay. The extra size of ROWID is not my problem.

My problem is that, to find "value" faster, I need to make an index, and that index, duplicates the ROWID, value duo if I am not mistaken, at list duplicating the database size suggests that.
As far as I know, the index is an internal table, with ROWID and value, sorted by value, but not unique.

So my question is that if there is way I can have that internal table, with a sorted "value" so I can query it much faster (as SQLite uses binary search or similar on these sorted columns)?

I am aware that if you have 20 column table, duplicating ROWID and "value", is not much problem. But my table is hundreds of million of just these two values, and I naturally prefer too look for a way to tell SQLite: "Hey can you keep value as sorted so you can search it faster?"
And I guess it can not, but I wanted to make sure to ask.

(10) By Kees Nuyt (knu) on 2022-11-23 10:21:50 in reply to 9 [link] [source]

and I naturally prefer too look for a way to tell SQLite: "Hey can you keep value as sorted so you can search it faster?" And I guess it can not, but I wanted to make sure to ask.

I have a few cents left.

What you have is:

CREATE TABLE T (
	value INTEGER
);

As you already know, implicitly that is:

CREATE TABLE T (
	t_id INTEGER PRIMARY KEY NOT NULL,
	value INTEGER
);

The index on value is comparable to:

CREATE TABLE IDX (
	value INTEGER,
	t_id INTEGER REFERENCES T(t_id),
	PRIMARY KEY (value,t_id)
);

A relatively new feature of SQLite is that it possible to use the internal structure and code of the index as a table by itself, thusly:

CREATE TABLE Samples (
	value INTEGER,
	t_id INTEGER,
	PRIMARY KEY (value,t_id)
) WITHOUT ROWID;

This replaces table T, so we may have to obtain a value for t_id.

1) If you are not interested in when or how many times a value occored, you can simply omit t_id:

CREATE TABLE Samples (
	value INTEGER PRIMARY KEY NOT NULL
) WITHOUT ROWID;

which happens to be the same as

CREATE TABLE Samples (
	value INTEGER PRIMARY KEY NOT NULL -- alias for rowid
);

2) If you are only interested in the number of times a value occured, replace t_id with a hit count:

CREATE TABLE Samples (
	value INTEGER PRIMARY KEY NOT NULL,
	nrofhits INTEGER
) WITHOUT ROWID;

which can be simplified to

CREATE TABLE Samples (
	value INTEGER PRIMARY KEY NOT NULL, -- alias for rowid
	nrofhits INTEGER DEFAULT 1
);

where nrofhits can be maintained with a trigger or with the INSERT ... ON CONFLICT UPDATE ... construct.

3) If you would be interested in the time of each sample you can replace t_id with a timestamp, like so:

CREATE TABLE Samples (
	value INTEGER,
	timestamp INTEGER DEFAULT strftime('%s','now'), -- max 1 sample per second
	PRIMARY KEY (value,timestamp)
) WITHOUT ROWID;

4) If only the order of samples you can maintain a sequence column "seq" in a separate table, more or less like SQLite does for autoincrement:

CREATE TABLE Samples (
	value INTEGER
	seq   INTEGER,
	PRIMARY KEY (value,seq)
) WITHOUT ROWID;
CREATE TABLE my_seq (
	seq INTEGER PRIMARY KEY NOT NULL,	
);

And maintain that with triggers or application code. Make sure the updates/inserts on tables Samples and my_seq for a specific sample happen within a transaction.

(12) By anonymous on 2022-11-30 01:39:34 in reply to 9 [link] [source]

Is your table read-only? Or you expect changes? Because if it is indeed read only then there is a way to discard the table and keep the index and use it as a table, highly discouraged but possible

(13.1) By Keith Medcalf (kmedcalf) on 2022-11-30 01:48:39 edited from 13.0 in reply to 9 [link] [source]

As far as I know, the index is an internal table, with ROWID and value, sorted by value, but not unique.

Internally, all indexes are always unique. The only difference is whether the remaining primary key referents are part of the key (and the payload empty) or the remaining primary key referents are part of the payload. In either case, the key is unique. Whether the "user specified bits" are declared unique or not only determines where any unused primary key referents are stored (ie: in the kley or in the payload).

In the case of a ROWID table, the ROWID is the primary key. In the case of a WITHOUT ROWID table, the PRIMARY KEY fields are declared as part of the table declaration.

(17) By Keith Medcalf (kmedcalf) on 2022-12-06 00:13:43 in reply to 3 [link] [source]

Why does it have duplicates? Maybe you should stop trying to solve a problem that only exists because you have not normalized your database.

(11.1) By cj (sqlitening) on 2022-11-30 00:55:26 edited from 11.0 in reply to 1 [link] [source]

1. Insert value + delimiter + next highest record counter
2. Search using value + delimiter
Hope this helps.  Tested using explain query plan and index is used.

drop table if exists t1;
create table if not exists t1(value text primary key) without rowid;
insert into t1 values('1312220-1');
insert into t1 values('1312220-2');
insert into t1 values('1-3');

select * from t1 where value >= '1-' and value <= '1-'||'A'; 
explain query plan select * from t1 where value >= '1-' and value <= '1-'||'A';

select * from t1 where value >= '1312220-' and value <= '1312220-'||'A'; 
explain query plan select * from t1 where value >= '1312220-' and value <= '1312200-'||'A';

SEARCH t1 USING PRIMARY KEY (value>? AND value<?)

(14) By cj (sqlitening) on 2022-12-01 07:51:30 in reply to 11.1 [source]

Without rowid needed a table to store total record count.
Getting rid of the rowid saves about 596-million bytes
in these tests inserting 100-million records.

%InsertThisMany = 100000000

FUNCTION PBMAIN AS LONG
 WithRowID    '2,898,509,824 bytes
 WithoutRowID '2,302,803,968
 '             =============
 '               596,705,856

END FUNCTION

FUNCTION WithoutRowID AS LONG '2,302,803,968 bytes
 LOCAL str AS STRING
 LOCAL rows AS LONG
 LOCAL x    AS LONG
 LOCAL value AS LONG

 slopen "junk.db3","C"
 slexe  "create table if not exists t1(value text primary key) without rowid;
 slexe  "create table if not exists counttable(counter integer primary key)"

 'Get number of rows or init counter in countable to zero
 str = slSelStr("select counter from counttable")
 IF LEN(str) THEN 'returned a value
  rows = VAL(str) 'put into rows
 ELSE             'else
  slExe "insert into counttable values(0)"
  IF slGetChangeCount = 0 THEN ? "First insert to CountTable failed":EXIT FUNCTION
 END IF

 slexe "begin exclusive"
 FOR x = 1 TO %InsertThisMany
  value = 9999999
  INCR rows
  slexe "insert into t1 values('" + FORMAT$(value&)+ "-" + FORMAT$(rows) + "');"
 NEXT
 slexe "Update CountTable Set Counter=" + STR$(rows&)
 slexe "end" 'success

 MSGBOX slSelStr("select * from CountTable"),%MB_SYSTEMMODAL,"Counter"
END FUNCTION

FUNCTION WithRowID  AS LONG  '2,302,803,968 bytes
 LOCAL x,value AS LONG
 slopen "junk1.db3","C"
 slexe  "create table if not exists t1(value int)"
 slexe  "create index if not exists indx ON t1(value)"
 slexe "begin exclusive"
 FOR x = 1 TO %InsertThisMany
  value = 9999999
  slexe "insert into t1 values(" + FORMAT$(value) + ")"
 NEXT
 slexe "end"
 MSGBOX slSelStr("select count(*) from t1"),%MB_SYSTEMMODAL,"t1 count"
END FUNCTION

(15) By cj (sqlitening) on 2022-12-01 23:23:13 in reply to 14 [link] [source]

Could also use values like this:
1312220
0000001 or select printf('%07d', '1')

(16) By cj (sqlitening) on 2022-12-05 23:02:32 in reply to 1 [link] [source]

Save comma delimited pointers in second column of each unique value.
If there many duplicates this will greatly reduce the file size.

value  addr
A      1,4,7
B      2,5,8
C      3,6,9

FUNCTION PBMAIN AS LONG

 create table if not exists t1(value text primary key,addr text) without rowid
 create table if not exists counttable(counter integer)

 begin exclusive
 sCounter = FORMAT$(VAL(slSelStr("select counter from counttable limit 1"))+1)
 IF sCounter <> "1" THEN
  update counttable set counter=" + sCounter
 ELSE
  insert into counttable values(" + sCounter + ")"
 END IF
 char = 65
 sValue = CHR$(char)
 select * from t1 where value='"+sValue + "'"

 IF slGetRow THEN
  update t1 set addr ='" + slf(2) + "," + sCounter + "'  where value='"+sValue + "'"
 ELSE
  insert into t1 values('" + sValue +  "'," + sCounter + ")"
 END IF
 commit
END FUNCTION

(18) By cj (sqlitening) on 2022-12-08 23:08:30 in reply to 16 [link] [source]

Added binding so input keys can be phrases that include single quotes

#INCLUDE ONCE "sqlitening.inc"  'Anything.bas
MACRO bind(str) = slBuildbindDat(str,"T")

FUNCTION PBMAIN AS LONG
 LOCAL sWord,sdata AS STRING
 slOpen "words.db3","C"
 'slexe  "drop table if exists WordTable;drop table if exists PointerTable" 'start over
 slexe  "create table if not exists WordTable(Word text primary key,Pointers text) without rowid"
 slexe  "create table if not exists PointerTable(HighPointer integer)
 IF LEN(slSelStr("Select * from PointerTable"))=0 THEN slexe "insert into PointerTable values(0)"
 DO
  sdata = GetRecordSet("Select * from WordTable")
  sWord = INPUTBOX$(sdata,"Insert word",sWord)
  IF LEN(sWord) = 0 THEN EXIT DO
  slexe "begin immediate"
  AddWord sWord
  slexe "end"
 LOOP
END FUNCTION
'=====================================================================================
FUNCTION AddWord(sWord AS STRING) AS STRING
 LOCAL sHighPointer AS STRING

 'increment pointertable.highpointer
 sHighPointer = FORMAT$(VAL(slSelStr("Select HighPointer from PointerTable limit 1"))+1)
 slexebind  "update PointerTable set HighPointer=?",bind(sHighPointer)

 'search wordtable.word for sWord
 slSelBind  "Select Word,Pointers from WordTable where Word=?", bind(sWord)

 IF slGetRow THEN 'update wordtable pointers column
  slexebind "update WordTable set Pointers = ? where Word = ?",bind(slf(2)+ "," + sHighPointer) + bind(sWord)
 ELSE             'insert wordtable
  slexebind "insert into WordTable values(?,?)",bind(sWord)+bind(sHighPointer)
 END IF
END FUNCTION
'=====================================================================================
FUNCTION GetRecordSet(sql AS STRING) AS STRING
 LOCAL sArray() AS STRING
 slSelAry sql,sArray(),"Q9"
 FUNCTION = JOIN$(sArray(),$CRLF)
END FUNCTION