about "strict" mode

(1.1) By tom (younique) on 2021-08-20 22:52:54 edited from 1.0 [link] [source]

Dear team,

I am following the SQLite timeline and have seen your efforts about introducing a strict mode to SQLite. First of all, thanks for that.

I remember that some time ago (probably in the old mailing list) there's been a suggestion for strict mode using a pragma statement. Have you considered this suggestion? (As far as I understand, the implementation used now requires "strict" in the table definition.)

Imho, a pragma had some major advantages:

Those who like strict handling could easily enable it for every table in every database by just setting "pragma strict=true" when opening the connection. Existing databases/tables don't need to be changed. Adding the pragma to .sqliterc would make SQLite-CLI always operate in strict mode.
Foreign key enforcement could be included. Currently, it's not clear to me how "mixed" tables would behave: Do I need the referencing or the referenced table to be strict, or both?
The "create table" syntax wouldn't diverge from standard SQL. No need to take care when importing SQL dumps.
Enforcing strings to be declared in single quotes and identifiers in double quotes could be enabled as well; not only for the table declaration, but for any other statement (especially SELECT) as well.
No need to use the rather complicated ALTER TABLE algorithm for switching existing databases to strict mode. (And if it turns out for someone that he was better without strict mode, he could easily disable it again.)

As the feature hasn't been released yet, maybe one could reconsider the implementation. What do others think here?

(2) By Simon Slavin (slavin) on 2021-08-21 13:39:21 in reply to 1.1 [link] [source]

I'm not against your suggestion, but both the web page and you missed out a vital consideration, so I'm going to bring it up here.

There's a nasty question of whether the strict setting should be stored in each database file. Or has to be set as each connection is opened like the connection timeout. But that's beyond the scope of my post.

Consider an existing database which was created without strict mode. It has all the sorts of faults listed in https://sqlite.org/src/wiki?name=StrictMode: missing keys, table definitions and TRIGGER with the wrong quotes, NULL primary keys values, all that badness, and more that people haven't thought of yet. A new programmer takes over the project and uses their own code (not a custom utility like the CLI) to turn on strict mode. What happens ?

Should turning on strict mode involve a lengthy check for all these things ?

Running the checks when switching to strict mode: Could take minutes or, rarely, hours.
Not running the checks when switching to strict mode: All operations throughout all programs which access the database now have to deal with formerly unanticipated error reports from a lot of API calls.
A strict setting is stored in each database file, and is set like PRAGMA page_size. To convert an existing database to strict mode you must VACUUM it. The extra checks take place during the VACUUM. VACUUM is changed so that it can return syntax error and other error results. VACUUM is changed so that it will check the integrity of foreign keys if strict mode has just been turned on.

Given that the check for strict-compliance could already be lengthy, and relies on the integrity of the database, should it start off with an integrity_check ?

(3) By tom (younique) on 2021-08-21 14:27:30 in reply to 2 [link] [source]

I had already though about that point, too. But imho it is not solvable at all, regardless of whether strict mode applies to single tables, a whole database or a connection.

The reason is that the database will always be allowed to be opened in an older version of SQLite not knowing about strict mode. If it's in the table declaration, (I assume) older SQLite version will just ignore it. And so they will with a database property and a connection property as well.

I'd argue in favor of performing the check only on an explicit "pragma integrity_check" as is currently (as far as I know) the same with foreign key enforcement.

(4) By Richard Damon (RichardDamon) on 2021-08-21 15:02:45 in reply to 3 [link] [source]

The answer for older code opening the database is to have it make a change that prevents older versions that don't understand the option from opening the database.

This has happened with some other options.

It does say that for a while, until the newer versions are commonly deployed, using the option will be a significant decision about data portability.

(5) By Richard Hipp (drh) on 2021-08-21 19:25:17 in reply to 1.1 [link] [source]

The STRICT table-option does three things:

Enforce authoritarian type-checking when inserting new content into tables.
Restrict column data type names in CREATE TABLE statements to one of INT, INTEGER, REAL, BLOB, TEXT so that we always know exactly which datatype is allowed in that column.
Require that all fields of a PRIMARY KEY be NOT NULL.

A "PRAGMA strict=ON" statement could do (1) and (3). But as the CREATE TABLE statement might already exist in the schema prior to the PRAGMA, it cannot do (2). This leads to an interesting conundrum:

CREATE TABLE t1(a DATE, b JSON);
PRAGMA strict=ON;
INSERT INTO t1 VALUES(a,b) VALUES(?1,?2);

Since SQLite does not know anything about "DATE" and "JSON", what enforcement decisions does it impose on the INSERT? Does it treat an unknown datatype as "ANY"? If that is the case, then if you misspell "INTGER" or "INTIGER" does that mean you get to insert any value you want into your integer column?

Perhaps your intent was that "PRAGMA strict=ON" would rescan the schema and fail if any table anywhere in the schema has an unrecognized datatype on a column?

(6) By Domingo (mingodad) on 2021-08-21 20:01:56 in reply to 5 [link] [source]

And what about also somehow expose the changes shown here https://sqlite.org/forum/forumpost/692ae69237f2537b?t=h through a pragma ?

See the output whith some random expressions here https://sqlite.org/forum/forumpost/103d7294464fcde8?t=h

Something like: "PRAGMA pedantic"

(7.1) By Gerry Snyder (GSnyder) on 2021-08-21 20:06:57 edited from 7.0 in reply to 5 [link] [source]

I can see advantages to having the mode controllable by PRAGMA, and I for one would be willing for the results to situation (2) be that existing tables with nonstandard data types be allowed to exist and be used, but any data insertion into them be an error.

Gerry Snyder

(8) By Ryan Smith (cuz) on 2021-08-21 21:24:03 in reply to 5 [link] [source]

Perhaps the description of what strict-mode does above isn't complete, but I am missing these checks:

forcing/requiring the use of correct quoting: strings can only be in single-quotes,
identifiers only bare or in double-quotes)
column default values having to be of the specific type of the column (checked at schema creation time)

Any chance those would be included?

(9) By Ryan Smith (cuz) on 2021-08-21 21:30:22 in reply to 5 [link] [source]

My further 2c here:

Consider that strict mode won't ever be backward compatible, in the sense that the entire domain of existing SQLite databases were not subjected to strict mode upon creation, so opening an old DB in a strict=ON connection will very likely fail (unless the previous schema makers were very strict themselves).
The moment a programmer chooses to invoke "strict mode", I cannot see why they would ever want to turn back to the old ways, so being able to turn strict mode on and off at a whim seems completely unneeded for future use.
Any table/schema made with "strict = ON" would be perfectly able to open and be used happily by any previous version of SQLite, so there really is not ever any reason to embed strict-ness in the DB file itself, other than perhaps a flag that indicates the last schema write was done with a strict-mode connection (or not).
One might still have old DBs and have to maintain them, while using the newest build of SQLite, so that provides a clear case where being able to turn strict-mode OFF for a connection prior to opening the DB could be useful, but it must be PRIOR to opening the DB file, and if it is a pragma, then that switch-over must become a no-op if done after a DB is already opened, or like some other pragmas, perhaps require a rebuild (vacuum) to get enabled on the DB and then heartily FAIL hard if the schema fails the strictness check.

Disabling/Enabling strict-mode at build-level though might have a lot of advantages as I imagine one could get rid of a whole lot of nonsense-checking code that won't be needed in strict mode anymore.

We've been lobbying hard here for strict-mode and love that this is becoming a reality, but we should understand that it is a hard paradigm shift and should not be compromised. I would probably start by using two builds myself, perhaps using the old in my older projects (until I can adapt them) and the new strict build for anything recent and going forward.

PS: Nice to have tools would be: Some simple check to see if a DB already complies to "strict" or not, and perhaps something to tell you what things in the current DB file needs to change for it to comply to "strict" mode. Probably nice additions to add to the CLI (else we might put out such a tool, since we need to make one for our purposes anyway.)

Very excited to start using strict-mode! :)

(10) By tom (younique) on 2021-08-21 21:44:33 in reply to 9 [link] [source]

We've been lobbying hard here for strict-mode and love that this is becoming a reality, but we should understand that it is a hard paradigm shift and should not be compromised

I agree with you: strict mode should be very carefully designed because it cannot be undone or changed once published.

(11.1) By tom (younique) on 2021-08-21 21:50:10 edited from 11.0 in reply to 5 [link] [source]

Restrict column data type names in CREATE TABLE statements to one of INT, INTEGER, REAL, BLOB, TEXT so that we always know exactly which datatype is allowed in that column.

This is a very interesting point as it caused strict mode to be more restrictive than standard SQL. Is that really the goal? Shouldn't all SQL data types (especially DATE, TIME, DATETIME which often led to confusion or inquiries here) be allowed but just behave as declared. So inserting a well-formated date into a date field should be fine, imho.

Edit: what about foreign-key-enforcement? While following the discussions here, I notice that even experts seem to sometimes forget about "pragma foreign_keys=on".

(12) By anonymous on 2021-08-21 22:17:45 in reply to 11.1 [link] [source]

Shouldn't all SQL data types (especially DATE, TIME, DATETIME which often led to confusion or inquiries here) be allowed but just behave as declared. So inserting a well-formated date into a date field should be fine, imho.

What's a well-formatted date? Even if you can come up with a sensible answer, what the user enters may not be ideal for SQLite to store if it does one date support dates natively. Better for a strict mode to disallow types it does not actually understand so that when types are added there are no backward compatibility concerns complicating the implementation.

(13) By Keith Medcalf (kmedcalf) on 2021-08-22 02:06:21 in reply to 11.1 [link] [source]

There is no DATE/TIME type is SQLite3.

You can store an INTEGER epoch date/time in an INTEGER field, and INTEGER or REAL epoch date/time in a REAL field, and a TEXT format ISO-8601 subset datetime in a TEXT field.

A STRICT table merely requires that the datatype be an actual supported datatype, and changes the semantics of the "value container" from being the "cell" or "value" itself to encompassing the entire column and also making the PRIMARY KEY definition, if it is not an INTEGER PRIMARY KEY of a rowid-table, be a PRIMARY KEY constraint (cannot contain nulls) rather than merely syntactic sugar for a UNIQUE constraint.

This makes a table declared as STRICT follow the standard. It does not make SQLite3 more restrictive than the standard, it merely brings the table up to standard.

The other "missing" stuff can already be activated by setting the DQS, TRUSTED_SCHEMA and FOREIGN_KEYs defines when building the library.

Of course, it has always been possible to coerce SQLite3 to operate in strict compliance with the standard if one wanted to do so. Adding the STRICT keyword to the table definition is sufficient. If you wish to activate other features (DQS, Foreign Keys, Recursive Triggers, etc) you do that the same way you always have.

(14) By anonymous on 2021-08-22 03:13:27 in reply to 5 [link] [source]

Does it (or will in future) enable some optimizations only in strict mode (due to knowing for sure the data type for some kind of queries and not having to check)?

(15) By SeverKetor on 2021-08-22 06:09:26 in reply to 5 [link] [source]

2. Restrict column data type names in CREATE TABLE statements to one of INT, INTEGER, REAL, BLOB, TEXT so that we always know exactly which datatype is allowed in that column.

I wonder if it may be useful to add a way to define new datatypes which STRICT could check. As an example, the json1 extension could define JSON as being TEXT and use the json_valid function to check it. On regular tables it would have no impact (except maybe changing the type affinity of JSON columns to TEXT instead of NUMERIC?), but on STRICT tables it would enforce valid JSON as if the column had a CHECK constraint on it.

This would be handy for backwards compatibility (I for one have code that relies on using the JSON datatype) and arguably allow for more clarity in the schema, if columns can retain their previous datatype.

On the other hand, it would increase the complexity and amount of code, and can be replaced with CHECK constraints and comments on the table schema, descriptive column names, etc.. Given that, I'm not sure it should be done, but I wanted to mention it in case others thought the same (or had better ideas on why it shouldn't be done).

(16) By Simon Slavin (slavin) on 2021-08-22 06:55:15 in reply to 12 [link] [source]

SQL engines which do support DATE, TIME, or DATETIME don't store in the database the thing you put in the SQL command. They convert it to an integer, and store that. For the sake of argument, you could say that all DATEs are stored as days since 1 January 1970, and all TIMEs are stored as milliseconds after midnight. When you ask for a value from a DATE or TIME field it gets produced as a string in one specific format.

So SQLite could introduce a DATE datatype by accepting only a string in one specific format. A later version of SQLite could accept dates in another format too. It just won't convert into the second format when you read a DATE value from a database. It won't break compatibility, either backward or forward.

(17) By tom (younique) on 2021-08-22 10:59:19 in reply to 12 [link] [source]

What's a well-formatted date?

<datetime literal> ::=
                <date literal>
              | <time literal>
              | <timestamp literal>

         <date literal> ::=
              DATE <date string>

         <time literal> ::=
              TIME <time string>

         <timestamp literal> ::=
              TIMESTAMP <timestamp string>

         <date string> ::=
              <quote> <date value> <quote>

         <time string> ::=
              <quote> <time value> [ <time zone interval> ] <quote>

         <timestamp string> ::=
              <quote> <date value> <space> <time value> [ <time zone interval> ] <quote>


         <time zone interval> ::=

         90  Database Language SQL

 





                                                    X3H2-92-154/DBL CBR-002
                                                               5.3 <literal>


              <sign> <hours value> <colon> <minutes value>

         <date value> ::=
              <years value> <minus sign> <months value> <minus sign> <days value>


         <time value> ::=
              <hours value> <colon> <minutes value> <colon> <seconds value>


         <interval literal> ::=
              INTERVAL [ <sign> ] <interval string> <interval qualifier>

         <interval string> ::=
              <quote> { <year-month literal> | <day-time literal> } <quote>


         <year-month literal> ::=
                <years value>
              | [ <years value> <minus sign> ] <months value>

         <day-time literal> ::=
                <day-time interval>
              | <time interval>

         <day-time interval> ::=
              <days value>
                [ <space> <hours value> [ <colon> <minutes value> [ <colon> <seconds value> ] ] ]


         <time interval> ::=
                <hours value> [ <colon> <minutes value> [ <colon> <seconds value> ] ]

              | <minutes value> [ <colon> <seconds value> ]
              | <seconds value>

         <years value> ::= <datetime value>

         <months value> ::= <datetime value>

         <days value> ::= <datetime value>

         <hours value> ::= <datetime value>

         <minutes value> ::= <datetime value>

         <seconds value> ::=
                <seconds integer value> [ <period> [ <seconds fraction> ] ]


         <seconds integer value> ::= <unsigned integer>

         <seconds fraction> ::= <unsigned integer>

                                                       Lexical elements   91

 





          X3H2-92-154/DBL CBR-002
         5.3 <literal>


         <datetime value> ::= <unsigned integer>

(18) By Domingo (mingodad) on 2021-08-22 10:59:23 in reply to 6 [link] [source]

Here is a sample to test the "pedantic" patch:
====

select "23 % 3", 23 % 3;
select "12.3 % 3", 12.3 % 3;
select "12 % 2.5", 12 % 2.5;

select "23 / 3", 23 / 3;
select "12.3 / 3", 12.3 / 3;
select "12 / 2.5", 12 / 2.5;

create table ta(a text, b integer, c float);

select 'insert declared types == value types';
insert into ta(a,b,c) values('a', 1, 2.0);

select 'insert declared types != value types text';
insert into ta(a,b,c) values('b', '1', '2.0');

select 'insert declared types != value types';
insert into ta(a,b,c) values('c', 1.0, 2);

select 'update declared types == value types';
update ta set a = 'a' ,b = 1, c = 2.0 where a = 'a';

select 'update declared types != value types text';
update ta set a = 'a' ,b = '1', c = '2.0' where a = 'a';

select 'update declared types != value types';
update ta set a = 'a' ,b = 1.0, c = 2 where a = 'a';

select 'update one value declared types != value types';
update ta set b = 1.0 where a = 'a';

select 'update one value declared types != value types';
update ta set a = 49 where a = 'b';


====

Output default sqlite:

====

23 % 3|2
12.3 % 3|0.0
12 % 2.5|0.0
23 / 3|7
12.3 / 3|4.1
12 / 2.5|4.8
insert declared types == value types
insert declared types != value types text
insert declared types != value types
update declared types == value types
update declared types != value types text
update declared types != value types
update one value declared types != value types
update one value declared types != value types

====

Output of sqlite with "pedantic":

====

23 % 3|2
FP Remainder received non integer values 3.000000 :: 12.300000
12.3 % 3|0.0
FP Remainder received non integer values 2.500000 :: 12.000000
12 % 2.5|0.0
23 / 3|7
FP Division received non integer values 3.000000 :: 12.300000
12.3 / 3|4.1
FP Division received non integer values 2.500000 :: 12.000000
12 / 2.5|4.8
insert declared types == value types
insert declared types != value types text
Affinity applied on make record 1 : 1 : D
Affinity applied on make record 2 : 1 : E
insert declared types != value types
Affinity applied on make record 1 : 2 : D
update declared types == value types
update declared types != value types text
Affinity applied on make record 1 : 1 : D
Affinity applied on make record 2 : 1 : E
update declared types != value types
Affinity applied on make record 1 : 2 : D
update one value declared types != value types
Affinity applied on make record 1 : 2 : D
update one value declared types != value types
Affinity applied on make record 0 : 3 : B

====

(19) By anonymous on 2021-08-22 14:06:55 in reply to 1.1 [link] [source]

I've used Oracle, and other strictly-typed databases. Depending on the circumstances, that strict typing guarantee can be a godsend (mapping to C structs), or a roadblock (mapping from fully-general Python dicts).

The ability to choose, at an appropriate level of granularity, for the problem, is nothing to sneeze at.

It is important to remember, however, that not everyone has the same problem!

STRICT could be defined at many levels of granularity:

Column level. You might make an individual column strictly INT, REAL, etc.
Table level, as described in the Draft.
File level.
Connection level.
Library level.

As we go up the scale, each of these is usable in fewer and fewer cases.

For example, one of my use-cases embeds a file's data dictionary (metadata) in the file. For the table of COLUMNS, which type should be used for the DEFAULT_VALUE column? That column should probably not be strict. But perhaps the table's other columns should be. And perhaps my application's tables, described in the data dictionary, should be strict, too. But perhaps only some of them would benefit.

So it's important to choose a level of granularity wisely. Ideally, one that broadens SQLite's applicability, overall, rather than narrowing it.

We see, from the above list, that the scope of the guarantee is one consideration. Scope, however, has multiple dimensions. One dimension is listed directly: size.

But there are other scopes. For example: time. How valid is the guarantee if the current connection enforces it, and later ones do not?

Another: the range of users. How valid is the guarantee if some apps' connections enforce it, and others' can't (or choose not to)?

From what I can see, the Draft description is an attempt to craft a guarantee that you can reasonably count on, in a wide variety of circumstances, at a level of granularity that fits a wide variety of uses.

(20) By Richard Hipp (drh) on 2021-08-22 22:50:50 in reply to 1.1 [link] [source]

New proposed documentation for STRICT tables is now available at https://www.sqlite.org/draft/stricttables.html. The implementation does not (yet) conform to this new documentation. The new documentation is submitted here for discussion.

The proposed changes it that there is a new datatype "ANY" available to columns which allows any datatype to be stored, even in STRICT tables.

(21) By David Jones (vman59) on 2021-08-23 00:04:46 in reply to 20 [link] [source]

How about being able to specify NOT STRICT in a column definition (analogous to NOT NULL)? With NOT STRICT, a column can have a type affinity without requiring that type. NOT STRICT without an accompanying type would operate the same as ANY. This may the defeat the objective of STRICT tables in the first place, I haven't thought deeply about it.

(22) By SeverKetor on 2021-08-23 01:38:27 in reply to 20 [link] [source]

Looks like there's a few typos in the new documentation.

Second sentence: "...if a table column as has a type of..."

Header three: "The ANY data type" → "The ANY Datatype"

Last line in third section: "For a an ordinary non-strict table..."

Last line: "~~There there~~ If there are multiple options, they can be specified in any order. To keep things simple, the current parser accepts duplicate options without complaining, but that might change in ~~figure~~ future releases, so applications should not rely on it."

(23) By Simon Slavin (slavin) on 2021-08-23 02:42:26 in reply to 21 [link] [source]

There's an argument for not having STRICT TABLEs at all. Just have a LAX keyword, or whatever a better word would be.

To make the whole database STRICT use the PRAGMA. If you have some legacy table data you can't get rid of which violates STRICT ,declare the table as LAX before you use the PRAGMA. This automatically makes appropriate indexes and views LAX too.

(24) By Domingo (mingodad) on 2021-08-23 08:06:59 in reply to 20 [link] [source]

Can also an extended sqlite_master be included on this proposal ?

So several extra info could be added there to allow customization/query ?

Maybe include an extra table sqlite_[strict] to save/customize the strict application or user defined options ?

(25) By anonymous on 2021-08-23 10:38:09 in reply to 20 [link] [source]

I think I like the way this is being introduced. Just a table modifier at creation time, that only introduces strict typing for fields with types other than "Any"

Since it does not have to deal with other issues like single vs double quoted strings or globally handling strict mode for the whole connection/db as others here are requesting, it can be tested and verified faster and can be released much sooner, with the opportunity for it to be improved upon (more data types) and for a global strict mode to still be viable in the future.

Keep up the great work!

(26) By skywalk on 2021-09-02 14:13:29 in reply to 20 [link] [source]

I prefer this strict mode but chose poorly for my REAL name :(
I use "DOUBLE" instead of REAL. Is there any chance this will be accepted?
...Old notes to self...
SQLite has 5 fundamental datatypes:
SQLITE3
SQLITE_INTEGER     = 1     ; 64-bit signed integer, INTEGER, SMALLINT, etc
SQLITE_FLOAT       = 2     ; 64-bit IEEE floating point number, DECIMAL(10,5), DATETIME, DATE, BOOLEAN, etc
SQLITE_TEXT        = 3     ; String, CHAR, VARCHAR(20), etc
SQLITE_BLOB        = 4
SQLITE_NULL        = 5
STRING CONSTANTS For DDL Statements:
SQLITE_INTEGER     = "INTEGER" ; 64-bit signed integer, INTEGER, SMALLINT, etc
SQLITE_FLOAT       = "DOUBLE"  ; 64-bit IEEE floating point number, DECIMAL(10,5), DATETIME, DATE, BOOLEAN, etc
SQLITE_TEXT        = "TEXT"    ; String, CHAR, VARCHAR(20), etc
SQLITE_BLOB        = "BLOB"
SQLITE_NULL        = "NULL"
...
Will there be a performance advantage using PRAGMA strict=ON;?

(27) By Larry Brasfield (larrybr) on 2021-09-02 14:49:51 in reply to 26 [link] [source]

I use "DOUBLE" instead of REAL. Is there any chance this will be accepted?

Someday, maybe, if a good case is made for the extra code and doc complexity. Consider, though, that there is no logical end to the growth of type identifiers.

...Old notes to self...

I don't see a question there, or much relevance to a posed question.

Will there be a performance advantage using PRAGMA strict=ON;?

Compared to what? Not in most cases. Compared to additional column constraints that have the same effect, probably, but that's for measuring.

(28) By skywalk on 2021-09-02 15:28:13 in reply to 27 [link] [source]

Of course there is a finite list of REAL identifiers. 
Are you implying FLOAT8,FLOAT4,etc?
I thunk there would be SINGLE and DOUBLE, but SQLite defaults all to DOUBLE internally for REAL?

On performance, does SQLite store data internally as both TEXT and REAL for floating data in non strict mode?

(29) By Larry Brasfield (larrybr) on 2021-09-02 16:08:56 in reply to 28 [link] [source]

Of course there is a finite list of REAL identifiers.
Are you implying FLOAT8,FLOAT4,etc?

Those would be among the plethora of type identifiers accepted without STRICT. My point is that proliferating them should be done only for very good cause.

I thunk there would be SINGLE and DOUBLE, but SQLite defaults all to DOUBLE internally for REAL?

As the doc linked below says, REAL specifies a 64-bit floating point representation (except see "As an internal optimization, small floating point values ..." there.)

On performance, does SQLite store data internally as both TEXT and REAL for floating data in non strict mode?

No. See Datatypes in SQLite v3.

(30) By skywalk on 2021-09-02 17:03:31 in reply to 29 [link] [source]

On performance, it is exactly the affinity issue in that link which confuses me. In strict mode, why all these checks required and dropping them would not improve insertion time? "A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, the storage class of the text is converted to INTEGER or REAL (in order of preference) if the text is a well-formed integer or real literal, respectively. If the TEXT value is a well-formed integer literal that is too large to fit in a 64-bit signed integer, it is converted to REAL."

(31) By Larry Brasfield (larrybr) on 2021-09-02 17:35:18 in reply to 30 [link] [source]

I don't understand what, if anything, you are disputing or questioning here. (It would help if you quoted the point(s) to which you are responding.)

Only a single representation of an INSERT value is stored. You can think of it as what is known as a tagged union except the stored size varies along with the type and value.

As for "all these checks required", an insertion under STRICT is more restrictive than a traditional, plain SQLite insertion. Some extra branching will be involved. This has a performance impact, albeit not a big one. Yet, compared to effecting the same restriction with column constraints, the additional STRICT restriction will be relatively faster.

Now, unless your questions, disputations or puzzlements become more concrete and specific, I am going to beg off a continuation of this nebulosity.

(32) By skywalk on 2021-09-04 13:13:10 in reply to 31 [link] [source]

Arrg, I was quoting from your datatypes link?
..."the storage class of the text is converted to INTEGER or REAL"
If insertions have decisions to make in non-STRICT mode, my natural assumption is those are bypassed in STRICT mode. 
Why dance around my question with so many words?
Fewer decisions = faster. Yes|No?
You do mention STRICT insertions have more decisions, but may be faster overall?
I guess I have to do tests.

(33) By Keith Medcalf (kmedcalf) on 2021-09-04 17:32:23 in reply to 32 [link] [source]

The STRICT keyword prevents the storage in a cell of a value that does not conform to the declared type affinity for the column in which the cell value is stored.

THe STRICT keyword prevents the use of gibberish type declarations in which heuristics are used to guess the column affinity. Rather, the type declaration of a column must be a specific datatype.

The STRICT keyword prohibits NULL in components of a PRIMARY KEY and requires that all components of a PRIMARY KEY be declared NOT NULL.

Without the STRICT keyword, any value of any type can be stored in any cell (intersection of column and row) notwithstanding the declared type/affinity.

Without the STRICT keyword, any gibberish may be used as a declared type name and heuristic rules are used to interpolate the datatype affinity meant. You can declare a column to be of type "Elongated Giraffe's with Big Boners" if you want, as compared to STRICT mode where you must use the proper datatype name NUMERIC for that case.

Without the STRICT keyword, a PRIMARY KEY declaration is merely syntactic sugar for a UNIQUE contraint.

It is not about performance -- it is about being STRICT.

Mayhaps the use of the STRICT keyword is more efficient (when storing rows) than simply declaring appropriate constraints which achieve the same ends without using the STRICT keyword, but the STRICT keyword is less typing and additional constraints do not need to be run at insert time.

(34) By anonymous on 2021-09-10 23:04:28 in reply to 9 [source]

With regard to point 3), while a strict database could be readable by an older version, schema changes must be forbidden or it will break strictness. If 'strict' also enforces field type on data then writes must be forbidden entirely.

Making strict mode a database creation option would allow code to skip non-strict checks when running on a strict database while still allowing use of old non-strict databases.

If a clean compatibility break is made with 'strict' databases then a bunch of other legacy cruft could be eliminated at the same time. An 'open' option should be provided to enforce which type of database is being used so code written for one type isn't mistakenly used with the other.

Converting an existing database isn't going to be trivial unless it is already effectively strict. Having to write custom conversion scripts and running something like "sqlite3 old.db '.dump' | python3 convertdb.py | sqlite3 --strict new.db" is going to be the norm for many users. For the rest that becomes "sqlite3 old.db '.dump' | sqlite3 --strict new.db". Costly, but it only needs to be done once and only in cases where switching makes sense.

(35) By Marco Bubke (marcob) on 2021-09-14 09:11:51 in reply to 20 [link] [source]

What about adding a column type ROWID?

rowid is used by many internal api's and I use it as primary key very often. But it is not semantically my primary key. I use then a unique index for that. But then I run into problems because the session extension need that primary key. So for I would find it handy to define a ROWID column which is then created as a special integer index. I use this index very often to implement a handle for fetching the row. Having integer handles is a really powerful pattern.

(36) By Keith Medcalf (kmedcalf) on 2021-09-15 01:19:56 in reply to 35 [link] [source]

There already is a column type ROWID. You spell it INTEGER PRIMARY KEY in a rowid table (that is, a table that is not WITHOUT ROWID).

(37) By Marco Bubke (marcob) on 2021-09-15 11:47:13 in reply to 36 [link] [source]

I think I was not clear enough. If I want to use an other key as primary key I cannot use the rowid anymore as primary key. Why I want to do that? Because first it is more clear what the primary key is and second the session extension is not working very well with row ids. ;-)

(38) By Ryan Smith (cuz) on 2021-09-15 12:17:26 in reply to 37 [link] [source]

May I enquire as to how exactly the session extension is having difficulty with ROWIDs?

I'm not looking to argue, I plan to use the session extension soon in a project, and am now concerned of usage caveats. Being as specific as possible will be much appreciated.

Thanks!

(39) By anonymous on 2021-09-15 12:18:17 in reply to 37 [link] [source]

From https://www.sqlite.org/lang_createtable.html :

ROWIDs and the INTEGER PRIMARY KEY

Except for WITHOUT ROWID tables, all rows within SQLite tables have a 64-bit signed integer key that uniquely identifies the row within its table. This integer is usually called the "rowid". The rowid value can be accessed using one of the special case-independent names "rowid", "oid", or "_rowid_" in place of a column name. If a table contains a user defined column named "rowid", "oid" or "_rowid_", then that name always refers the explicitly declared column and cannot be used to retrieve the integer rowid value.

In short, you've already got a row id in the table. And, unless the table also has 3 columns named rowid, oid, and _rowid_, that row id is accessible to your code.

(40) By Marco Bubke (marcob) on 2021-09-15 12:54:04 in reply to 38 [link] [source]

RowIds are not stable if the are generated by the DB. I used sessions for an staging area. So I was reverting them, then applied the external changes and applied the reverted once again. If the external changes were semantically the same as my changes it could not work because the primary key was a rowid. The primary key had to be a key which is the same for the external changes and my changes.

(41) By Marco Bubke (marcob) on 2021-09-15 12:56:25 in reply to 39 [link] [source]

But implicit rowids behave differently to explicit rowids. For example they don't survive a vacuum. ;-)

(42) By Gerry Snyder (GSnyder) on 2021-09-15 14:30:13 in reply to 41 [link] [source]

Yes, you can handle it either way you want.

Both are easy to do, and you can choose which depending on your needs.

(43) By Simon Slavin (slavin) on 2021-09-16 12:46:16 in reply to 41 [link] [source]

And if you want rowids to survive a vacuum you can just declare them in the table definition. SQLite already does what you want. It doesn't need a special non-standard column type.

(44) By anonymous on 2021-09-16 13:05:58 in reply to 43 [link] [source]

How would you do that in a STRICT table which already has a non-int PRIMARY KEY?

(45) By Kees Nuyt (knu) on 2021-09-16 14:30:35 in reply to 44 [link] [source]

A table can have multiple uniquely identifying columns, we can call those candidate keys. One of the candidate keys is chosen as the primary key.

It is just a matter of taste which of the candidate keys is chosen as the primary key, typically it is the key that foreign keys refer to.

The other candidate keys should have a UNIQUE constraint. Without that constraint, you can't call it a key.

If you add an extra INTEGER candidate key for performance reasons, so it can serve as the key that is referenced by foreign keys, it should become the primary key of the table.

This is called a surrogate key, because it does not represent the attribute of an entity in the real world.

In SQLite I would implement that as follows:

Before:

CREATE TABLE T1 (
  text_pk TEXT PRIMARY KEY NOT NULL
, ...
) WITHOUT ROWID STRICT;

After:

CREATE TABLE T1 (
  surrogate_pk INTEGER PRIMARY KEY NOT NULL
, text_pk TEXT UNIQUE NOT NULL
, ...
) STRICT;

(46) By Marco Bubke (marcob) on 2021-09-17 06:32:11 in reply to 45 [link] [source]

Please read the documentation of the session extension. Primary keys are not arbitrary. Your second example would simply not worked like intended because if you merge with external data the rowid is not known. Sory for the short answer, I explained it already in a different post in this thread. There is also a whole thread about it.

(47) By Keith Medcalf (kmedcalf) on 2021-09-17 18:09:35 in reply to 40 [link] [source]

RowIds are not stable if the are generated by the DB.

What? Of course they are. It sounds to me like you are talking about a user error in allocation of the pseudo-key range amongst non-contiguous databases. (Meaning that you have the same database being created/updated in two (or more) places and you have not adequately managed the keyspace.)

This is a long solved problem (since at least 1940).

Although other methods have existed since the 1940's, you could use the "Microsoft Method" and simply assign a subrange to each instance.

(48) By Richard Damon (RichardDamon) on 2021-09-17 19:49:16 in reply to 47 [link] [source]

No, implicitly generated rowids are NOT stable. In particular, it has been mentioned that when you vacuum a table, IMPLICIT rowids might change, while explicitly generated columns of type PRIMARY INTEGER will be stable.

Not sure if there is an explicit promise of exactly how long an implicit key is promised to remain, and changing its value does require updating all the indexes for the table, so won't be don't lightly (so vacuuming is a good time to reset them to squeeze out spaces in the numbers since indexes are being regenerated anyway).

(49) By Keith Medcalf (kmedcalf) on 2021-09-18 00:28:14 in reply to 48 [link] [source]

No, implicitly generated rowids are NOT stable.

You are incorrect. Implicitly generated rowids are stable.

create table x
(
  rowid integer primary key,
  y
);
insert into x (y) values (1);

The rowid is implicitly generated but is stable.

You are confusing the rowid itself (which may be specified either explicitly or computed implicitly) with whether or not the rowid is being stored in an explicitly designated column and is part of the row data.

create table x
(
 y
);
insert into x (rowid, y) values (1,1);

Although the rowid is explicitly generated, it is stored in an implicitly defined column are therefor does not constitute part of the row data.

(50) By Richard Damon (RichardDamon) on 2021-09-18 00:47:47 in reply to 49 [link] [source]

IF you declare the column, then the column is explicit and stable, yes, even if you don't provide a value in the insert, there are explicit rules to auto-generate the value.

The implicit rowid that people are talking about is the rowid column that ISN'T defined in the schema, but are able to be read back using several implicitly defined names.

something like

create table x ( name text primary key; );

insert into x ('Foo');

select rowid, name from x;

THAT rowid which was never defined in the schema is not stable and can change if you vacuum the table.

Note your case was a given value was 'implicitly generated', and that is stable (the database doesn't kee track of which values were auto-generated, and which were explicitly provided..

In this case, the whole COLUMN was 'implicitly generated', and is not stable.

(51) By Scott Robison (casaderobison) on 2021-09-18 01:24:23 in reply to 49 [link] [source]

From https://www.sqlite.org/lang_vacuum.html:

"The VACUUM command may change the ROWIDs of entries in any tables that do not have an explicit INTEGER PRIMARY KEY."

Where one might define stable as "firmly established and unchangeable".

(52) By Keith Medcalf (kmedcalf) on 2021-09-18 03:11:59 in reply to 50 [link] [source]

even if you don't provide a value in the insert, there are explicit rules to auto-generate the value

No. If you do not specify a value for the rowid then IMPLICIT rules are used to calculate an IMPLICIT value. You cannot specify how the rowid is computed if you do not specify it explicitly.

You may either explicitly specify a value, or if one is not specified then one will be implicitly computed for you.

Notwithstanding whether the value of the rowid is given explicitly or calculated implicitly, the resulting value is only part of the "stable row data" if the column which is to contain the rowid is declared explicitly. If the column which is to hold the rowid value is not explicitly declared, then the value is ephemeral and is only guaranteed to identify a particular table row during the execution of the query returning the rowid. It is not a stable attribute of the row and may change (as viewed by a connection) any time that connection does not hold a lock on the database.

(53.1) By Scott Robison (casaderobison) on 2021-09-18 05:56:22 edited from 53.0 in reply to 52 [link] [source]

Keith, I don't know if you do this because you like to argue or because you aren't reading all the provided information. The original statement might have been imprecise, but pedantry doesn't help solve problems. The genesis of this thread branch would cause any reasonable person to infer that a table lacking an INTEGER PRIMARY KEY column lacks stable ROWID values across a VACUUM. This is a true statement: they aren't stable, whether the row was inserted with an explicit value or a SQLite generated value:

sqlite> create table a(b);
sqlite> insert into a (rowid,b) values (42,'foo');
sqlite> insert into a (b) values ('bar');
sqlite> select rowid,* from a;
42|foo
43|bar
sqlite> vacuum;
sqlite> select rowid,* from a;
1|foo
2|bar

Both ROWID values changed because there is no INTEGER PRIMARY KEY in the table, but stable ROWID values are required for some use cases.

I know you've said as much, but the way you "debate" the issue is confusing as you appear to contradict yourself. If the session extension didn't require a stable ROWID (as I understand an earlier reply) then this would be moot. But someone wants to use it with a different PRIMARY KEY and cannot. All the arguing in the world about how people should do things or insulting their lack of aptitude with how they use the tools provided doesn't solve the problem in this case. Sure, they could write their own equivalent to the session extension and circumvent this problem. But then they could also write their own SQL engine, and their own C standard library, text editor, operating system, and everything else in the software stack.

Presumably they are here to leverage the excellent resources available so they don't have to reinvent the wheel, and what (to them, anyway) is a minor tweak to improve utility is not an unreasonable suggestion. That doesn't mean it must be implemented, but everything else presented since then is a distraction from the original suggestion and is increasing the noise and depressing the signal.