SQLite3 Extension

(1) By anonymous on 2020-12-14 07:54:09 [link] [source]

I would like to compile the extension extension-functions.c.

I downloaded this file & tried to compile with Visual Studio: sqlite3ext.h was missing. I downloaded this file & added it to the project. I am getting this error:

Severity Code Description

Error LNK1561 entry point must be defined

How do I overcome this?
How can I include the source code into the 3.34.0 amalgamation so that I have the extension incorporated?

(2) By ddevienne on 2020-12-14 09:18:51 in reply to 1 [link] [source]

One easy way to build additional non-amalgamated extensions is to append
the source at the end of the amalgamation. As far as I recall at least.
Keith or someone else will chime-in soon when the US wakes up.

Otherwise (re)read https://www.sqlite.org/loadext.html perhaps?

There's also https://stackoverflow.com/questions/30898113/how-to-compile-an-extension-into-sqlite

(3) By Keith Medcalf (kmedcalf) on 2020-12-14 10:07:23 in reply to 1 [link] [source]

How do I overcome this?

You want to be producing a DLL, not an executable. Tell Visual Studio to make it thus. I have no idea how you do this in the clickety-pokey.

How can I include the source code into the 3.34.0 amalgamation so that I have the extension incorporated?

Simply append it to the end of the amalgamation file. The real problem is how you get the init function called. The easiest way (to me at any rate) is to append a function (after the extensions you want to add are all appended) that adds the init function for each extension to the auto extension list for new connections, and set the pre-processor symbol SQLITE_EXTRA_INIT to the name of this function. I have no idea how you do this in the clickety-pokey.

An example SQLITE_EXTRA_INIT function looks like this:

int core_init(const char* dummy)
{
    int nErr = 0;

    nErr += sqlite3_auto_extension((void*)sqlite3_autobusy_init);
    nErr += sqlite3_auto_extension((void*)sqlite3_ipaddress_init);

    return nErr ? SQLITE_ERROR : SQLITE_OK;
}

so you would then define SQLITE_EXTRA_INIT=core_init when compiling the amalgamation code and the extensions would thereafter be automatically initialized on each connection.

The sqlite3_initialize function defined in main.c will call SQLITE_EXTRA_INIT function if the define is defined as the last step of the environment initialization process.

If you do something in the EXTRA_INIT function that needs to be shutdown then you can also provide a function to clean that up by defining SQLITE_EXTRA_SHUTDOWN to be the name of that function, which will be called first when the sqlite3_shutdown API is called. (You do not need to reset the auto extension list -- it is already done for you.)

(4) By anonymous on 2020-12-14 11:24:38 in reply to 3 [link] [source]

You want to be producing a DLL, not an executable.

I did that. I receive this message on loading the extension:

Error: The specified procedure could not be found.

Note procedure and not module as the error usually is!

I compiled with these files without any modifications:

extension-functions.c
sqlite3.h
sqlite3ext.h

I am still missing something - what?

(5) By Keith Medcalf (kmedcalf) on 2020-12-14 12:16:08 in reply to 4 [link] [source]

You have provided insufficient information for meaningful diagnosis.

You say you created a DLL. What is the DLL called?

What do you mean by "loading the module"? What, exactly, did you do? What command did you issue or instruction did you issue to make you think that you did this?

You need to provide detailed information regarding what you are doing such that people can tell what you are talking about. Example:

On Windows 10.0.19042.685 x64 I compiled the extension-functions.c and produced a dll called goobers.dll. Then, using an SQLite3 executable version 3.74.23.1 I issued the following command:

.load goobers

I got the message "Error: The specified procedure was not found"

sqlite3.exe was executed from the command line in the same directory that contains the file goobers.dll.

Please run dumpbin /exports goobers.dll and post what entry point name is being exported from the DLL.

(6) By Keith Medcalf (kmedcalf) on 2020-12-14 12:38:45 in reply to 4 [link] [source]

Assuming that you are using the code as linked in your first post, that code is defective. It uses the "old fashioned name" for the init function. While this will work, you should fix it if you ever plan to use more than one extension internally. It also does not export any symbols.

The sqlite3_extension_init name should be changed to sqlite3_<name>_init where <name> is the name of the dll without idiocies (no spaces, no symbols, no tomfoolery, just the name, only the name, and nothing but the name). If you are wise you will stick to lower case only.

For example, if you are compiling to a DLL called goobers.dll then that function should be renamed to sqlite3_goobers_init. If you make the name too long or have foolish characters in it, then it might not work properly.

Secondly you need to designate this function as an export. Linux shared libraries by default export all "visible" symbols. Windows does not.

You have to specify what you want exported by tagging the function with __declspec(dllexport) in order to have it exported, or use a .def file as input to the linker to tell it what to export.

So, find the line that starts int sqlite3_extension_init( and change it to __declspec(dllexport) int sqlite3_goobers_init( where the part between the underscores is the name of the dll without the .dll (ie, this example is for an extension DLL called goobers.dll).

(7) By Richard Hipp (drh) on 2020-12-14 12:43:46 in reply to 1 [link] [source]

The scalar math functions (ex: sin(), pi(), atanh(), etc) that are in extension-functions.c are now in core on trunk. Which of the functions in extension-functions.c were you wanting to use? Will the new functions in the core SQLite meet your needs instead?

(8) By Larry Brasfield (LarryBrasfield) on 2020-12-14 12:48:29 in reply to 4 [link] [source]

As Keith asks, dumpbin will show what your extension's init function was really called. Then you can compare that to the name that is generated when you do not specify it in the load extension operation, as documented in section 4, here.

You say you "compiled with these files ...", without leaving any clue as to what that means. The build guidance in section 3, here shows one possible meaning, known to produce a usable result. If you are doing something else, a careful study of how that differs from the guidance will likely be instructive. If you are using the Visual Studio IDE, you can see in one of its plethora of window panes how the command-line tools are actually being invoked. That likely differs in some critical manner from the guidance.

(9) By anonymous on 2020-12-14 13:48:03 in reply to 7 [link] [source]

The following lists the functions that are NOT in SQLite3 (unless they are differently named) I'd like to use; the Comment column explains showing * are the function in extension-functions.c that I'd like available.

ExtnFunction	Comment
atanh	*
atn2	exists as atan2,
charindex
cot	*
coth	*
difference
leftstr
lower_quartile
ltrim
median
mode
padc
padl
padr
proper	* assuming that this capitalises first letter of word - initial or following space and lowercase all others
replace
reverse∣	assuming that this rotates a string
rightstr
rtrim
sign	*
square
strfilter
trim	* assuming this removes leading/trailing & embedded blanks
upper_quartile
variance

(10) By anonymous on 2020-12-14 14:02:27 in reply to 5 [link] [source]

The DLL is called SQliteExtension.DLL. It exists where SQLite3.exe is ... see .shell dir command in the session output

Here's my session:

SQLite version 3.34.0 2020-12-01 16:14:00
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .shell dir sqliteextension.dll /b
SQLiteExtension.dll
sqlite> select load_extension('sqliteextension.dll');
Error: The specified procedure could not be found.

sqlite>

The size of sqliteextension.dll is 82,944 bytes (86,016 bytes on disk).

dumpbin is very odd - I cannot see any exported functions - it looks like this:

File Type: DLL

  Summary

        1000 .00cfg
        1000 .data
        2000 .idata
        1000 .msvcjmc
        3000 .pdata
        4000 .rdata
        1000 .reloc
        1000 .rsrc
       10000 .text
       10000 .textbss

Something wrong with my compilation ... is it because of something missing in extension-functions.c ... or else what?

(11) By Keith Medcalf (kmedcalf) on 2020-12-14 14:09:24 in reply to 10 [link] [source]

Yes. You need to export the sqlite3_extension_init function either by including a .def file or by putting __declspec(dllexport) before the function declaration in the source code.

(12) By anonymous on 2020-12-14 14:11:19 in reply to 10 [link] [source]

I think you've explained the dumpbin output already.

(13) By anonymous on 2020-12-14 14:51:58 in reply to 11 [link] [source]

There is much to be said for pre-compiled binaries not least that they provide a known point of (coding standard) reference to which the published documentation applies.

I am giving up on this extension, choosing to implement the functionality, as necessary, in calling code.

(14) By Keith Medcalf (kmedcalf) on 2020-12-14 14:59:48 in reply to 12 [link] [source]

>dumpbin /exports decimal.dll
Microsoft (R) COFF/PE Dumper Version 14.28.29334.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file decimal.dll

File Type: DLL

  Section contains the following exports for decimal.dll

    00000000 characteristics
    5EC064A9 time date stamp Sat May 16 16:09:45 2020
        0.00 version
           1 ordinal base
           1 number of functions
           1 number of names

    ordinal hint RVA      name

          1    0 0000FA40 sqlite3_decimal_init

  Summary

        1000 .CRT
        1000 .bss
        1000 .data
        1000 .edata
        1000 .idata
        1000 .pdata
        3000 .rdata
        1000 .reloc
       1A000 .text
        1000 .tls
        1000 .xdata

for an extension called decimal.dll

(15) By Larry Brasfield (LarryBrasfield) on 2020-12-14 15:41:34 in reply to 13 [link] [source]

I got that very old, contributed extension to compile/link to a DLL which could be loaded as a SQLite extension without explicitly stating the entry point. To do that, I had to correct for two deviations in extension-functions.c from the guidance published as Programming Loadable Extensions. These deviations are:

(1) There is nothing to mark the sqlite3_extension_init() function as one to be exported as an entry point in the DLL, for which purpose the guidance recommends the preface:#ifdef _WIN32 __declspec(dllexport) #endif That alteration suffices to get the would-be entry point to appear in the output of dumpbin /exports .

(2) The sqlite3_extension_init() function's name does not adhere to the guidance suggestion, "You will do well to customize the name of your entry point to correspond to the name of the shared library you will be generating, rather than using the generic "sqlite3_extension_init" name." Unless the extension DLL happens to be named "extension.dll", this deviation will make it necessary to supply the actual entry point name as a second argument to the load extension call, (whether that be use of the .load command or the load_extension(X,Y) form of built-in SQL function.)

(16) By anonymous on 2020-12-14 15:54:51 in reply to 7 [link] [source]

Are the math functions intended for a future release?

I do not see DSQLITE_ENABLE_MATH_FUNCTIONS mentioned at Compile-time Options.

It appears to be unavailable in the v3.34.0 pre-compiled binary for Windows:

SQLite version 3.34.0 2020-12-01 16:14:00
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select sin(9);
Error: no such function: sin
sqlite>

(17) By anonymous on 2020-12-14 16:03:19 in reply to 15 [link] [source]

Perhaps the source code for supplied extensions should have chronological versions in line with SQLite3 releases especially when the source code is not in line with current guidance.

If that were the case, you could publish the code as you have modified for v3.34.0.

(18) By tom (younique) on 2020-12-14 20:15:32 in reply to 16 [link] [source]

It's not in 3.34, it's in upcoming 3.35.

(19) By anonymous on 2020-12-14 21:22:31 in reply to 18 [link] [source]

What is the expected release date for 3.35 compiled binaries?

If not known, indicate guesses would be welcome.

(20) By Richard Hipp (drh) on 2020-12-14 21:34:52 in reply to 19 [link] [source]

https://www.sqlite.org/draft/releaselog/3_35_0.html

(21) By skywalk on 2020-12-14 22:56:21 in reply to 20 [link] [source]

Nice! I see a sqrt(). :)

(22) By Ryan Smith (cuz) on 2020-12-15 11:06:10 in reply to 20 [source]

Great stuff! Especially thanks for adding atan2() - this function is often overlooked and vital for us in calculating rotational offsets in 2D coordinate systems (or 3D planes, it has other uses too, but this is what we care about).

Two questions:

1 - Will this be enabled in the pre-compiled binaries? (I believe it is warranted).
2 - Any appetite for adding these?:

FRAC(x) - The counter part to TRUNC(x) returning only the fraction part of x.
EPS() or EPSILON() - The smallest unit of difference in the underlying FP storage.
MAXINT/MININT - Syntactic sugar for the integer range bounds.

I know FRAC is simply defined as: FRAC(x) = x - TRUNC(x) so very easy to get to in SQL, but the named function is more coherent and makes for more legible SQL.

EPSILON() allows us to compare IEEE754 numbers distance or similarity. Perhaps something like:
WHERE abs(X - Y) > epsilon()
to compare numbers (which are within two base units from each other), etc.

Lastly MaxInt and MinInt would be simply syntactic sugar for +9223372036854775807 (aka 0x7FFFFFFFFFFFFFFF) and also -9223372036854775808.
I believe the utility of it is clear, for instance being able to say: X = 5 * ( RANDOM() / MaxInt ) or CASE WHEN x * y <= MaxINT THEN TRUNC(x*y) ELSE 'Int-Overflow' END

PS: I know these examples can be achieved in other ways are not very complex, but they serve only as quick examples here.

(23) By Keith Medcalf (kmedcalf) on 2020-12-15 16:02:20 in reply to 22 [link] [source]

The "machine epsilon" is different from the value epsilon. The machine epsilon is the difference between 1 and the very next representible value.

The useful value is the "epsilon" of a value. This is the absolute difference associated with "toggling" the last bit of the significand on and off, or the value of the Unit in the Last Place (ULP).

The expression abs(X - Y) > epsilon() tells you nothing (and is quite stupid, unless X happens to be 1).

What you need to know is the number of ULPs between the X and Y which would require calculating the ULP of X and using that as the divisor of the subtraction. This will tell you the number of ULP of X by which Y differs from X, and this is a useful number. If it is 5 then X and Y are pretty damn close to equal. If it is 5 million, then X and Y are not very close to each other at all. And the sign of the result tells you which one is bigger.

IEEE-754 compliant floating point arithmetic requires that all operations be carried out to within 1 ULP (the actual requirement is that they be computed exactly and then rounded to the nearest representible value, which is slightly different). This means that the difference between X and Y should be measured in the ULP distance between X and Y in based on the ULP of X. While machine epsilon might be helpful here, it would be perspicacious to simply compute the ULP of X directly (since epsilon is merely the ULP of 1 -- why go to all the extra complication).

Assuming that one has a floating point value X in IEEE-754 base 2 representation, one would compute the "ULP" value of that represesentation by merely subtracting from the exponent the "number of bits" in the significand, and setting the significand to 1. Similar methods can be used for other bases.

That is:

static __inline double epsilon(double value, int significand)
{
    int exponent;
    double mantissa = frexp(value, &exponent);
    return ldexp(1.0, exponent - significand);
}

where for "double precision IEEE-754" the value of significand is 53. Note that you can compute this entirely without the math library, but since you already have it, why not use it.

Note also that there is a function for returning the correct fractional part and integer part of a floating point number and that function is modf. While your "simple" method of computation may produce results that are accurate to within a boulder of salt, the modf function is guaranteed to produce results accurate to 1 ULP on IEEE-754 compliant systems.

(24) By anonymous on 2020-12-15 17:14:17 in reply to 23 [link] [source]

and that function is modf

I don't have it in v3.34.0. Is it in 3.35.0 - not mention here?

(25) By Keith Medcalf (kmedcalf) on 2020-12-15 17:22:15 in reply to 24 [link] [source]

modf is a libm (math library function) that returns the fractional and integer parts of a floating-point number.

(26) By Ryan Smith (cuz) on 2020-12-15 20:28:39 in reply to 25 [link] [source]

Thanks for the notes Keith, I'd be equally happy if modf() then makes it into the new internal sqlite math functions.

(27) By tom (younique) on 2020-12-16 13:50:52 in reply to 26 [link] [source]

Me too, but I'd prefer having it called the standard way: frac(x)

(28) By stonebig on 2020-12-19 15:47:10 in reply to 7 [link] [source]

Hello,

Would it be possible to include STDDEV also to this already awesome list ?

(29) By stonebig on 2020-12-19 16:01:12 in reply to 7 [link] [source]

I'm not sure what the ISO / ANSI standard for STDDEV is exactly, only finding references here

https://www.oreilly.com/library/view/sql-in-a/9780596155322/ch04s02.html

If Sqlite follows Postgresql,(https://www.postgresql.org/docs/9.1/functions-aggregate.html), it means my demand STDDEV demand shall be:

VARIANCE_SAMP VARIANCE_POP STDDEV_SAMP STDDEV_POP VARIANCE = alias for VARIANCE_SAMP STDDEV = alias STDDEV_SAMP

(30) By anonymous on 2020-12-19 23:01:50 in reply to 29 [link] [source]

Your list plus Median & Mode functions will certainly make SQLite3 more viable for R (another application sharing the SQLite3 philosoply) people.

(31) By Warren Young (wyoung) on 2020-12-20 06:16:34 in reply to 29 [link] [source]

I'm not sure what the ISO / ANSI standard for STDDEV is exactly,

I assume you mean an API standard here, rather than definition of the statistical functions themselves?

As your links into the PostgreSQL documentation indicate, there's a difference between sample standard deviation and population standard deviation. The latter then gets us into probability density functions and such.

Why would we put such complexity into a "lite" library for storing and retrieving data? Isn't that why we have a host programming language for SQLite? Load the numbers as a list/array, etc., and pass them to the statistics library for the host language.

(32) By anonymous on 2020-12-20 06:48:48 in reply to 31 [link] [source]

Why would we put such complexity into a "lite" library ...

Perhaps ...

for ease of use?
those functions are simple to implement not complex at all
to entice statisticians (large/influencial community of R users!) to use SQLite?

(33) By Warren Young (wyoung) on 2020-12-20 07:01:29 in reply to 32 [link] [source]

for ease of use

That justifies anything.

not complex at all

Only if you ignore the full definition involving probability density functions and such.

R users

R has sd() built in, including extensive distribution function support. You couldn't have picked a language less in need of an external implementation of this sort of function.

(34) By Keith Medcalf (kmedcalf) on 2020-12-20 07:52:44 in reply to 32 [link] [source]

Actually they are quite complex to calculate properly, though there are known methods of computing such things by successive approximation which are generally free of pathological behaviour, such methods are not implemented as built-ins for SQLite -- even the builtin AVG() function is implemented in its most pathologically susceptible form.

And you should not use the method from that web page, except when calculating by pencil and paper, as it also suffers from pathological defects when implemented on a computer.

(35) By anonymous on 2020-12-20 10:06:40 in reply to 34 [link] [source]

Not sure what pathological actually means in the SQLite context; at a guess, I expect you are hinting at the issues introduced by null, zero, and non-numeric values in a numeric column that might be used to calculate these statistical metrics.

If that is the case, then Microsoft Excel columns (or rows) are just as susceptible to pathology as the (in essence type less) SQLite columns. Excel has those statistical methods built in. No need for a debate on Excel!

I mentioned R because like SQLite, it is free, extensible, and has an avid and loyal^influential user base and has packages (aka extension or library) build around SQLite.

No harm in making suggestions for 3.35. In the end, it is SQLite development team that decides - that much is well understood.

^influential = R conferences have an international attendance by people representing household name organisations; most of have more letters after their name than in their names. COVID-19 no doubt put an end to that.

(36) By Stephan Beal (stephan) on 2020-12-20 10:16:18 in reply to 35 [link] [source]

Not sure what pathological actually means in the SQLite context;

"Pathological" cases are those which require "excessive" efforts or computing resources. Often times a solution to a given problem works fine on "normal" inputs but will slow down drastically, or cost far more memory, or have similar penalties for certain "legal, but probably rare or unexpected" inputs, and such cases are described as "pathological" cases.

More info: https://en.wikipedia.org/wiki/Pathological_(mathematics)#Computer_science

(37) By Warren Young (wyoung) on 2020-12-20 11:19:01 in reply to 36 [link] [source]

In this case, I'd Keith is talking about loss of significance combined with a wide range of FP exponents.

This is another good reason for leaving such calculations to specialist libraries and languages. If SQLite and R give me different results for a given statistical calculation, I'm inclined to assume R's got it right, because that's R's domain.

(38) By stonebig on 2020-12-23 16:04:19 in reply to 31 [link] [source]

If you want SQlite small, you can always modify the compilation flags accordingly.

R is not the cup of tea of everybody, or usable in any cicrcumstances.

For a PC usage, in Python base version for example, the more it can have common standard SQL instruction features (CTE, Window Functions, Statistic Functions), the better.

Other solutions are sometimes free, but with a high cost/complexity of installation.

(39) By Warren Young (wyoung) on 2020-12-24 09:09:40 in reply to 38 [link] [source]

R is not the cup of tea of everybody

R was offered up-thread as the answer for why SQLite needs statistics functions. My response was that no programming language needs SQLite to have them less than R.

Now we get this opposite argument.

If it it's the non-R users who need these functions, then which programming language are we talking about? Every common programming language you can name either has such things built in as well, or it has a code repository where you can add it; either way, you can then export them as application-defined functions.

Uncommon languages where such things haven't been written yet are hardly a good argument for adding this to SQLite, but even then, you could write the implementations in that language yourself and export them to SQLite.

If someone wants to wrap such statistics functions up into an extension for use by others, great! I just don't think such a thing can be justified as a built-in for a "lite" library designed for storing and retrieving data, particularly given that SQLite's original purpose — and still most common use case — is to be linked into a larger program written in a language which either does have statistics functions built-in or which offers some mechanism for adding them from a third-party library repository.

Have I overlooked a common programming language for which that is not the case, which thus needs the functions to be part of SQLite, there being no other way to get them?

The only possible argument I can see for doing this is for the convenience of those using the sqlite3 shell, who also cannot use SQLite extensions for some reason. That just brings us back to the "lite" argument.

Have I overlooked some common use case that doesn't fall under one of the objections above?

(40) By anonymous on 2020-12-24 09:57:19 in reply to 39 [link] [source]

Have I overlooked some common use case that doesn't fall under one of the objections above?

In what quantifiable way will the incorporation of those functions make SQLite less lite?

I fully subscribe to the recommendation to move beyond simple SELECT and INSERT statements mixed with a pile of procedural code and instead write complex queries that give the answer directly and out of the box.

The divide here boils down to a matter of opinion; therefore pointless, probably.

(41.1) By Warren Young (wyoung) on 2020-12-24 10:18:47 edited from 41.0 in reply to 40 [link] [source]

In what quantifiable way will the incorporation of those functions make SQLite less lite?

Code isn't free to write, nor to maintain, nor to compile, nor to store, nor to load from disk, nor to keep in RAM unused, nor to swap out to disk once the kernel realizes it is in fact unused.

If stddev() were part of ANSI SQL or trivial to write correctly, I wouldn't be arguing against it, but it's neither.

(42.1) By Warren Young (wyoung) on 2020-12-24 10:19:31 edited from 42.0 in reply to 40 [link] [source]

…move beyond simple SELECT and INSERT statements mixed with a pile of procedural code and instead write complex queries that give the answer directly…

drh wrote of the talks he's been giving about this topic, the most recent of which — that I'm aware of — is his "2020 SQLite Status Update" from 3 weeks ago. That link takes you into the middle of the talk where he begins his discussion of a sequence of slides that eventually takes you to a very complicated query which makes use of an application-defined SQLite function: Fossil's files_of_checkin() table-valued function, called right up at the top.

drh is not advocating that all possible functions be built into SQLite.

(43) By anonymous on 2020-12-24 10:24:14 in reply to 41.0 [link] [source]

Did you forget nor to debug, nor to unit test, nor to regression test?

As time goes by, more and more of the legacy functions will be unused yet they remain part of SQLite.

(44) By Warren Young (wyoung) on 2020-12-24 10:38:50 in reply to 43 [link] [source]

Such lessons are a motivating factor behind wishes to not increase the ongoing legacy function debt.

Say drh writes this stddev() function, and in his typical "lite" style doesn't provide support for probability density functions, or implements it as sample standard deviation and not population standard deviation, or both. This will doubtless satisfy some subset of the users wanting this, but then someone will come along and want the improvements. Do we then have two or three stddev() type functions, one or more of which is deprecated?

If you think that's an unreasonable worry, click that PostgreSQL link up-thread.

Or, what about Keith's worries about implementation details involving loss of precision and such due to pathological FP arithmetic behaviors? If it's written in the "lite" style implied by the links up-thread showing the theoretically-right but wrong-in-practice algorithms, does the same thing occur in a few years when that implementation is replaced by a version using clever numerical methods to avoid the problems?

But oh, you say, if the latter occurs, SQLite will just get an improved implementation; why would we need a second implementation? Wouldn't the new one just replace the old? The answer is, no, because people depend on wrong behavior all the time, and SQLite goes to uncommon lengths to preserve old behaviors for the benefit of such people. Thus the deprecated but still available functions, the ifdefs to restore legacy mechanisms, the _v2 and _v3 C API functions...

So yeah, I do expect implementing statistics functions to be an ongoing source of code churn, one best left either to an extension or to the host language's libraries, wrapped for the benefit of SQLite as application-defined functions.

(45) By stonebig on 2020-12-31 13:50:08 in reply to 42.1 [link] [source]

it will be a bit annoying and verbose to compute stddev without "stddev"

with datas(criteria, qty) as( values ('a', 1) ,('a', 2) , ('a', 3), ('a', 4) ,('a', 5), ('b',1), ('b',3) ) , intermed(criteria, squared) as (select criteria, power(qty-avg(qty) over (partition by criteria) , 2) from datas ) select criteria, sqrt(sum(squared)/count(*)) as stddev from intermed group by criteria

(46.3) By stonebig on 2021-01-02 09:59:15 edited from 46.2 in reply to 45 [link] [source]

After re-doing my basic math, it seems to be written like this

with datas(criteria, qty) as( values ('a', 1) ,('a', 2) , ('a', 3), ('a', 4) ,('a', 5), ('b',1), ('b',3) )

select criteria, sqrt((sum(qty * qty) -2* sum(qty) * avg(qty) + avg(qty) * avg(qty) *count( * ))/count( * )) as stddev from datas group by criteria

so "stddev(x)" could just be pre-transformed into

"sqrt((sum(x * x) -2* sum(x) * avg(x) + avg(x) * avg(x) * count( * ))/count( * ))"

(47) By Richard Damon (RichardDamon) on 2021-01-02 15:08:20 in reply to 46.3 [link] [source]

You can simplify this results a bit by remembering that avg(x) = sum(x) / count(x) to

sort(sum(qtyqty) - sum(qty)*sum(qty)/count()) / count(*)

(and that final count() becomes (count()-1) if you are dealing with a sample instead of a full population.)

(48) By stonebig on 2021-01-02 15:52:41 in reply to 47 [link] [source]

.. much better indeed. thanks for the remark.

with datas(criteria, qty) as( values ('a', 1) ,('a', 2) , ('a', 3), ('a', 4) ,('a', 5), ('b',1), ('b',3) )

select criteria, sqrt( (sum(qty * qty) - sum(qty) * avg(qty)) /count( * )) as stddev from datas group by criteria

(49) By Keith Medcalf (kmedcalf) on 2021-01-02 21:26:32 in reply to 48 [link] [source]

Even so you still have to presort your input by ascending absolute magnitude to avoid catastrophic precision loss.

Severity	Code	Description
Error	LNK1561	entry point must be defined