SQLite Forum

Timeline
Login

15 forum posts by user lifepillar

2021-08-25
09:41 Edit reply: When is "IS" used in SQLite? (artifact: de354e4fd1 user: lifepillar)

My point has very little, if anything, to do with how SQLite internally treats certain SQL constructs. My point is about the logical meaning of certain expressions.

It was claimed that the value of 5=TRUE (0) being different from the value of 5 IS TRUE (1) contradicts my assertion that = and IS evaluate differently only when the expressions contain null. My answer was that 5=TRUE and 5 IS TRUE are two totally unrelated expressions, because (as per the documentation I have cited):

  1. TRUE is an alias for 1, so that 5=TRUE is actually 5=1;
  2. as an exception to (1), is followed by true becomes the unary operator is true, which (logically) takes a Boolean as an argument. So, 5 is true is (logically) equivalent to (true) is true. The internal representation in SQLite source or inside CPU registers is irrelevant to my argument.

To elaborate further:

  • is [not] behaves like is [not] distinct from is PostgreSQL;
  • is [not] distinct from in PostgreSQL adheres to the rules of ISO SQL;
  • the rules of ISO SQL can be summarized as follows: given the expression V1 is distinct from V2:

  • if both V1 and V2 evaluate to null, the result is False;

  • if V1 is null and V2 is not null, or vice versa, the result is True;

  • otherwise, V1 and V2 are both definite values, which can be compared with standard equality, so the result is True or False accordingly.

Hence, = and is can evaluate differently only when nulls are involved, because in cases (1) and (2), the result of = may be null, and in case (3), the result of = is the same as the result of is by definition.

I hope my explanatio is clear. I am not addressing the rest of your post (please attack the arguments, not the individual formulating them), except for this:

As a conditional, a NULL value evaluates as False.

Most of the time, but now always:

create table (x int check (value > 0));
insert into T values (null); -- OK

Clearly, here NULL evaluates to True. Nulls are tricky!

08:04 Reply: When is "IS" used in SQLite? (artifact: 68bd46871c user: lifepillar)

My point has very little, if anything, to do with how SQLite internally treats certain SQL constructs. My point is about the logical meaning of certain expressions.

It was claimed that the value of 5=TRUE (0) being different from the value of 5 IS TRUE (1) contradicts my assertion that = and IS evaluate differently only when the expressions contain null. My answer was that 5=TRUE and 5 IS TRUE are two totally unrelated expressions, because (as per the documentation I have cited):

  1. TRUE is an alias for 1, so that 5=TRUE is actually 5=1;
  2. as an exception to (1), is followed by true becomes the unary operator is true, which (logically) takes a Boolean as an argument. So, 5 is true is (logically) equivalent to (true) is true. The internal representation in SQLite source or inside CPU registers is irrelevant to my argument.

To elaborate further:

  • is [not] behaves like is [not] distinct from is PostgreSQL;
  • is [not] distinct from in PostgreSQL adheres to the rules of ISO SQL;
  • the rules of ISO SQL can be summarized as follows: given the expression V1 is distinct from V2:

  • if both V1 and V2 evaluate to null, the result is False;

  • if V1 is null and V2 is not null, or vice versa, the result is True;

  • otherwise, V1 and V2 are both definite values, which can be compared with standard equality, so the result is True or False accordingly.

Hence, = and is can evaluate differently only when nulls are involved, because in cases (1) and (2), the result of = would be null, and in case (3), the result of = is the same as the result of is by definition.

I hope my explanatio is clear. I am not addressing the rest of your post (please attack the arguments, not the individual formulating them), except for this:

As a conditional, a NULL value evaluates as False.

Most of the time, but now always:

create table (x int check (value > 0));
insert into T values (null); -- OK

Clearly, here NULL evaluates to True. Nulls are tricky!

2021-08-24
20:51 Reply: When is "IS" used in SQLite? (artifact: 4c463f635c user: lifepillar)

To be fair, I have not been entirely accurate when I wrote:

Any expression that contains a null IS (the same) as any other expression that contains null etc.

That should read “Any expression that evaluates (or reduces) to null IS (the same) as any other expression that evaluates (or reduces) to null etc.

For instance, true or null is obviously not the same as null.

Ok, enough. Nulls are tricky, avoid them if you can!

19:26 Edit reply: When is "IS" used in SQLite? (artifact: a8f06f91bb user: lifepillar)

you make false claims

No. You are not taking implicit type casting into account. Let's make your query more explicit:

sqlite>select 5 = cast(true as int) as "==", cast(5 as boolean) is true as " is ";

==           is       
----------  ----------
0           1

From SQLite's manual:

The boolean identifiers TRUE and FALSE are usually just aliases for the integer values 1 and 0, respectively. However, if TRUE or FALSE occur on the right-hand side of an IS operator, then they form new unary postfix operators "IS TRUE" and "IS FALSE" which test the boolean value of the operand on the left.

So, 5 = true is the same as 5 = 1, and 5 is true is the same as true is true.

19:25 Reply: When is "IS" used in SQLite? (artifact: aacb336ccc user: lifepillar)

Not disagreeing with the theory btw - big fan of Codd myself, but reality rarely conforms to ideals.

On the contrary, you should disagree with Codd's “theory” (about the specific topic of nulls—not about the Relational Model, of course!), which is untenable, hence far from ideal (on the other hand, the problem he was trying to address is inherently intractable, as it entails detecting tautologies).

The reality is that SQL conforms really really well to what Codd proposed, despite his proposal being incoherent and, to his own admission, “preliminary and in need of further research”.

For the curious ones: read Codd's paper, §2.3, and compare the definitions therein with the behaviour of y̵o̵u̵r̵ ̵f̵a̵v̵o̵r̵i̵t̵e̵ ̵s̵y̵s̵t̵e̵m̵ SQLite :)

18:51 Reply: When is "IS" used in SQLite? (artifact: 44ed0c2ee8 user: lifepillar)

you make false claims

No. You are not taking implicit type casting into account. Let's make your query more explicit:

sqlite>select 5 = cast(true as int) as "==", cast(5 as boolean) is true as " is ";

==          cast(5 as boolean) is true
----------  --------------------------
0           1

From SQLite's manual:

The boolean identifiers TRUE and FALSE are usually just aliases for the integer values 1 and 0, respectively. However, if TRUE or FALSE occur on the right-hand side of an IS operator, then they form new unary postfix operators "IS TRUE" and "IS FALSE" which test the boolean value of the operand on the left.

So, 5 = true is the same as 5 = 1, and 5 is true is the same as true is true.

11:50 Reply: When is "IS" used in SQLite? (artifact: de3c250da1 user: lifepillar)

Sorry for nitpicking, but I can't resist noticing that null next to “semantics” forms an oxymoron :)

To expand on the above: is differs from == only when the considered expression contains a null. The difference is that == obeys Codd's Substitution Principle, while is does not.

According to Codd's Substitution Principle, an expression has the truth value “unknown”, if and only if:

  1. each occurrence of null may be replaced by a (distinct or already used) non-null value so that the expression’s truth value becomes “true”, and

  2. each occurrence of null may be replaced by a (distinct or already used) non-null value so that the expression’s truth value becomes “false”.

This is a pretty ad-hoc principle, which (1) was formulated by Codd strictly under the “existing, but unknown value” interpretation of nulls (but it is logically incompatible with such an intepretation), and (2) leads to many unresolvable inconsistencies in the treatment of nulls in SQL. Unfortunately, that is cast in stone in the standard, so one has to live with it. The best way to live with it is to avoid using nulls altogether.

Re (1), I'll just note that null cannot be interpreted as “value at present unknown” in general, because, if that were the case, then select x=x from T should always return true, even when x is null.

IS has instead a clean semantics. Any expression that contains a null IS (the same) as any other expression that contains null and IS NOT (the same a s) any other expression not containing a null (so, it's either true or false). This rule is used, among the rest, for the elimination of duplicates:

create table T (x int);
insert into T values (1), (1), (null), (null);
select distinct x from T;

and for set operations:

select x from T intersect select x from T;
select x from T union select x from T;
select x from T except select x from T;

again, in accordance to what Codd himself warned being a "preliminary and in need of further research" proposal.

2021-08-18
08:16 Post: Page-level locking? (artifact: 2119b4457c user: lifepillar)

Is it correct that SQLite's locks still operate only at database level?

I recall reading about a branch to implement page-level locking. Is that a thing? Being developed?

2021-07-29
08:05 Reply: documenting SQLite's "IS" behavior (artifact: 7f1973474f user: lifepillar)

That's implicit casting at work.

sqlite> select 2 is true;

is like

psql=# select 2::boolean is not distinct from true;

In the second example, true must be cast to integer first, so:

sqlite> select 2 is +true;

is equivalent to PostgreSQL's SQL:

psql=#  select 2 is not distinct from +true::int;

And so on. So I stand by my claim above.

2021-07-28
20:17 Reply: documenting SQLite's "IS" behavior (artifact: 06774a9ac5 user: lifepillar)

I think that the simplest way to put it is:

  1. SQLite's is is the same as ISO SQL's is not distinct from;
  2. SQLite's is not is the same as ISO SQL's is distinct from.

I have not checked the formal semantics of is and is not in SQLite, so it is possible that there are subtle differences, but from your examples and my few tests (1) and (2) above appear to be true.

2021-05-17
19:30 Reply: Nearest-match join (artifact: 0e593d562e user: lifepillar)

I assume that you want to find, for each record in a table T1, the nearest record(s) in table T2 (but not vice versa). As already noted by others, there may be more than one record in T2 matching a record in T1.

Sample data:

create table T1 (
  id integer primary key,
  ts real    not null,
   a char    not null
);

create table T2 (
  id integer primary key,
  ts real    not null,
   b integer not null
);

insert into T1(id, ts, a)
values (1, 1.0, 'A'), (2, 2.0, 'B'), (3, 3.5, 'C');

insert into T2(id, ts, b)
values (10, 2.0, 50), (20, 2.5, 100), (30, 9.5, 200), (40, 4.5, 300);

Solution 1:

select X.id, X.ts, X.a, Y.id, Y.ts, Y.b, abs(Y.ts - X.ts) as diff
  from T1 X, T2 Y
 where not exists (select 1
                     from T2 Z
                    where abs(Z.ts - X.ts) < abs(Y.ts - X.ts)
                  );

Solution 2:

with ranking_table as (
  select rank() over (partition by T1.id order by abs(T2.ts - T1.ts)) as ranking,
         T1.id as t1id, 
         T1.ts as t1ts,
         T1.a as t1a,
         T2.id as t2id,
         T2.ts as t2ts,
         T2.b as t2b,
         abs(T2.ts - T1.ts) as diff
    from T1, T2
)
select t1id, t1ts, t1a, t2id, t2ts, t2b, diff
  from ranking_table
 where ranking = 1;

The latter should perform better, but you have to test it. Solution 2 has also the advantage that you may easily find the N timestamps in T2 nearest to each timestamp in T1, for N>1, simply by using where ranking <= N.

In both cases, the queries can be made to perform much better if you can set an upper bound to the difference between the timestamps.

This is an instance of the k-nearest neighbour problem, if you want to explore it further.

2020-11-07
18:26 Reply: General question concerning database stucture and number of databases to use. (artifact: 0a9b868c18 user: lifepillar)

It appears that my thinking was rather backward, for I thought a larger number of tables containing fewer records would be more efficient in some manner.

If I may, your thinking is backward in another sense: you are caring about physical details before having resolved all the logical issues. What's most important, that your user's data is correct at all times or that they get garbage, but fast, responses? “Premature optimization is the source of all evils“ they say, and that applies to databases more than to anything else. My recommendation is: think about your data logically, designing as many “tables“ (relation schemas) as needed, no more, no less; then, trust SQLite to do its job egregiously. If at some point it fails you, only then investigate why and take corrective actions. You might be surprised at the end to discover how many of your assumptions turned out to be wrong. You might not need to take any corrective action, after all.

The concept of a relation schema as opposed to a table being a generic empty container to hold records is very helpful.

The concept is not only helpful, it is essential to the Relational model (albeit way too many times ignored). Tables (more formally, relation schemas) are far from “generic containers” holding a bunch of “records”. They carry meaning: a precise semantics decided by the database designer. A relation schema corresponds to a predicate, which you must define and document; and each tuple corresponds to a fact, whose meaning is given by the predicate associated to the relation schema. I have seen many databases whose data nobody could understand or had different opinions about (!), such as event tables whose timestamps were meaningless because three different applications had three different notions of what an "event" was. That happens precisely because a database is treated as a closet (the engineering term for that being “persistence layer”). Again, treat your database schema as a language instead, and design a proper one for your application domain. Then, your queries will be simple, easy to understand, and well performing.

In that respect, I don't need four tables to hold the data across all portfolios.

Well, that depends. You may need less, you may need more. From what I may infer about your application (you haven't provided many details), your database schema might be considerably more complex that one or two or four tables.

If the JSON strings were parsed into separate columns, then there would be different facts and relationships,

No. Again, you are thinking in physical terms (the JSON data format). Think in logical terms: what is a portfolio? How do you identify a portfolio and how do you distinguish it from another portfolio? How do you describe it (what are the relevant attributes)? Etc. The database descends from this kind of considerations, not from the byte format of your data source.

but, in this simple case, it just makes more work for the application to stringify in Javascript to parse in C in order to write to SQLite database (and almost never use the data in C) and then retrieve and stringify it in C to send back to JavaScript to parse again.

You may certainly store some data as JSON blobs (SQLite has some support for JSON after all), but consider such a choice carefully. Is the content of such blobs something you won't ever need to query in more or less complex ways? You may corner yourself into a situation where you have to write ad hoc code (in the worst case, at the application level) to extract information from JSON fields, when a simple SQL query would have done the job, had the data been properly organized.

2020-11-06
20:09 Reply: General question concerning database stucture and number of databases to use. (artifact: f524dc1b41 user: lifepillar)

I'd appreciate any guidance you could provide.

Review the staples of Relational database technology. In particular, read about data integration and the distinction between schema and instances.

Data integration

Just consider this: usually, DBMSs do not allow you to perform queries across databases. So, if you have one database per portfolio, you cannot (easily) perform queries across all portfolios. Besides, with different databases you will most likely end up storing duplicate and/or inconsistent data. There are other reasons to prefer integrated data, but for “a rather simple tool” these two should suffice.

Schema vs instance

A database schema is not just a bunch of “tables”: it describes a language (in a very formal sense) with which you describe facts (by means of tuples in relations, or, informally, as “records” in “tables”). As a rule (with few exceptions), adding a new relation schema (i.e., a “table”, in common parlance) is justified only if you need to describe a new, distinct, fact type. Concretely, when you need to add data, say, about a new portfolio, you shouldn't have to create a new table, but add data to one or more existing, properly designed, tables that accurately capture all the relevant fact types about “portfolios”. Of course, that requires determining such fact types.

So, your problem essentially amounts to how to properly model your requirements as a (one) well-designed database. There is currently (only) one (ISO/IEC/IEEE) standard for Relational database design, which is ISO 31320-2 (note: only the “key-style” modeling described in the standard is relevant to Relational database design). Unfortunately, the ISO document is not freely available, but the relevant parts are nearly identical to the former US government's Federal Information Processing Standard (FIPS) PUB 184, which can be found here. The latter also contains a detailed description of a sound methodology for database design. Even if you decide not to adopt IDEF1X as your modelling methodology, I'd recommend that you read at least Annex A of the FIPS document, which clearly describes how to approach database design.

Lastly, I'd like it to be easy for users to share the portfolios they've built with others, and to incorprate those portfolios into the user's database

I assume, from your description, that each user has its own SQLite-backed app, and that you want users to share data between their respective databases. If that is the case, when your database is well-designed—in particular, if you have devised good relational keys—that becomes (almost) trivial.

2020-08-25
14:18 Post: About the newly added decimal extension (artifact: 1166699aab user: lifepillar)

I am glad that a decimal extension has landed into SQLite!

From a cursory view of your code, it seems to me (correct me if I am wrong) that your extension operates directly on strings (i.e., char arrays) representing decimal numbers.

You are certainly aware of my sqlite3decimal project. Just as a quick summary, it uses a binary format for storage (using blobs), which can accommodate any number (with a finite number of digits, plus NaNs/Infs). Such format is called "decimalInfinite" (or decInfinite for short). It has the nice property that it preserves the total ordering among numbers (including special numbers such as +/-Inf), so that comparisons can be performed with memcmp() without any decoding. For arithmetic operations, numbers are decoded into the internal struct of an IEEE754-compliant library, which is currently decNumber (but could be anything else).

I am wondering whether you have considered to reuse (part of) my project. IMO, the binary encoding for storage is a very good choice for databases. DecimalInfinite is relatively simple, compact, order-preserving, and it can be encoded/decoded reasonably fast. Since the representation does not impose any limits, it is future-proof (you could implement decimals with up to 10 significant digits today, and extend it to 20 digits tomorrow, with no required change at the storage level). In principle, you could define arithmetic operations directly on the binary format, but no such algorithms have been published to date, to my best of my knowledge (that's why I use decNumber).

The implementation in sqlite3decimal is my own (see decInfinite.h and decInfinite.c), and it can be easily adapted for something other than decNumber. The implementation could be simplified if more reasonable limits were imposed to the size of the decimals (especially for exponents).

Of course I cannot release decNumber with a different license, but I would have no problem to put my code in the public domain.

Thoughts?

2020-08-06
06:55 Post: Bug in search field of SQLite's web site (artifact: fa93984aad user: lifepillar)

Hi, the following query:

https://sqlite.org/search?s=d&q=testing%5C

results in:

Wapp Application Error

missing close-brace
    invoked from within
"subst -novar -noback $s"
    (procedure "wappInt-enc-qp" line 3)
    invoked from within
"wappInt-enc-qp "$q""
    invoked from within
"subst -novariables {<a style="[wappInt-enc-html "$style"]"
href="search?q=[wappInt-enc-qp "$q"]&i=[wappInt-enc-qp "$i"]">[wappInt-enc-html "[expr $i+1..."
    ("uplevel" body line 1)
    invoked from within
"uplevel 1 [list subst -novariables $txt]"
    (procedure "wapp-trim" line 5)
    invoked from within
"wapp-trim {
        <a style="%html($style)"
           href="search?q=%qp($q)&i=%qp($i)">%html([expr $i+1])</a>
      }"
    (procedure "searchresults" line 134)
    invoked from within
"$cmd"
    ("uplevel" body line 1)
    invoked from within
"uplevel {$cmd}"
    invoked from within
"time [list uplevel $script]"
    (procedure "ttime" line 2)
    invoked from within
"ttime {$cmd}"
    invoked from within
"db transaction {
    set t [ttime {$cmd}]
  }"
    (procedure "wapp-default" line 46)
    invoked from within
"wapp-default"

You may want to fix that!

Life.