SQLite Forum

Timeline
Login

24 forum posts by user rbucker

2021-05-05
00:55 Edit reply: "Office Space"... I'm losing all those pennies (artifact: f94ecd7317 user: rbucker)

correct. The issue was that converting it to pennies was the problem... SOLVED. thansks

UPDATE: I ended up going to github and pulling the decimal.c file then adding it as an extension in my golang project.

2021-05-04
12:13 Reply: "Office Space"... I'm losing all those pennies (artifact: 6f9b97d104 user: rbucker)

correct. The issue was that converting it to pennies was the problem... SOLVED. thansks

10:51 Reply: "Office Space"... I'm losing all those pennies (artifact: aafb40f67d user: rbucker)

I thought I knew how the data was stored.... as text with some affinity.... at least in my actual use-case everything is text on purpose.

The closest thing to the real solution would have been the decimal math, however, in the actual project code I'm writing it in golang and the decimal extensions are not immediately available.

thanks.

2021-05-02
02:43 Reply: "Office Space"... I'm losing all those pennies (artifact: 4e8fc21cc6 user: rbucker)

Kieth, your previous post about using round() makes perfect sense even though I've never had to do that sort of gymnastics before with other DBs. Sadly I'm not able to correlate your second note. But thanks.

01:31 Post: "Office Space"... I'm losing all those pennies (artifact: b86f20c917 user: rbucker)

I'm building a financial application and for some reason I'm losing pennies. I can remember the days when I wrote a lot of C code and had to give up using floating point because of the conversion(s). But I would expect different from a DB.

SAMPLE:

sqlite> select sum(cast('16.15' as numeric))*100;
1615.0
sqlite> select cast(sum(cast('16.15' as numeric))*100 as int);
1614
sqlite> select sum(cast(cast('16.15' as numeric)*100 as int));
1614
sqlite> select printf('%d', sum(cast('16.15' as numeric))*100);
1614
sqlite> select printf('%.0f', sum(cast('16.15' as numeric))*100);
1615
2021-05-01
13:57 Edit: how to collate in a union and noaccent (artifact: 82875c322b user: rbucker)

As is the life of an ETL programmer I'm constantly matching things that do not really match and while I can de-dupe the number of "collate" expressions when creating the schema it's not always known in advance. and so on...

I started with this example:

select * from (
select 'a'
union
select 'A'
)x

then I added the COLLATE

select * from (
select 'a'
union
select 'A'
collate nocase
)x

It's interesting to note that the results depend on the order of the 'A' and 'a'... I'm not sure it's meaningful or not.

I'm not a language expert but in the example I think the "collate nocase" is being assigned to the column so if I added a FROM the COLLATE would need to be attributed to the column and not the from... (no examples in the doc that I could see)

select * from (
select 'a' from test
union
select 'A' collate nocase from test
)x

And so here are my questions...

  • can there be multiple collate attributes(can they be stacked)?
  • I read someplace that NOACCENT was left to the programmer? The reason given was that it was too big to implement. SHRUG. Given the use-cases for SQLite one might think it should be baked in.(see below)

example from SQL Server(??)

SELECT * FROM Venue WHERE Name COLLATE Latin1_general_CI_AI Like '%cafe%' COLLATE Latin1_general_CI_AI

THANKS!

13:25 Post: how to collate in a union and noaccent (artifact: e5c9dc35a1 user: rbucker)

As is the life of an ETL programmer I'm constantly matching things that do not really match and while I can de-dupe the number of "collate" expressions when creating the schema it's not always known in advance. and so on...

I started with this example:

select * from (
select 'a'
union
select 'A'
)x

then I added the COLLATE

select * from (
select 'a'
union
select 'A'
collate nocase
)x

It's interesting to note that the results depend on the order of the 'A' and 'a'... I'm not sure it's meaningful or not.

I'm not a language expert but in the example I think the "collate nocase" is being assigned to the column so if I added a FROM the COLLATE would need to be attributed to the column and not the from... (no examples in the doc that I could see)

select * from (
select 'a' from test
union
select 'A' collate nocase from test
)x

And so here are my questions...

  • can there be multiple collate attributes(can they be stacked)?
  • I read someplace that NOACCENT was left to the programmer? The reason given was that it was too big to implement. SHRUG. Given the use-cases for SQLite one might think it should be baked in.

THANKS!

2021-04-06
12:31 Reply: Compiling sqlite3 with readline support (artifact: 6ee0dd92ac user: rbucker)

The subject line is sprcific "compiling sqlite", however, unless you knoew to ask the question you might not ask it... But here is a workaround that seems to work nicely...

This is a problem in OpenBSD.

install the rlwrap package if not already installed.

in the .profile or .bashrc add this line assuming that

alias sqlite3="rlwrap -c sqlite3"

and then log back in or run the command manually.

2021-03-27
01:53 Reply: How to set the column separator for the CSV extension in the CLI? (artifact: 50e41b6c60 user: rbucker)

I'm not sure if/when I replied to this post... BUT IT ROCKS!!!

This implementation is only slightly different than the CSV version supplied with the SQLITE3 source and frankly I use them both. What's awesome about this is that I've been able to load the extensions into my golang projects too. In fact my use-case loads a zipcode table with lat/long and I use another extension to compute the estimated, as the crow flies, distance between two zipcodes.

But one thing is missing from the mention. If you use chroot, like I did in both linux, freebsd and OpenBSD... you have to remember to copy the dependent libs from the root OS to the chroot in the same/similar path tree.

(hope that makes sense)

I compiled my vsv.c and moved it to the lib/ folder of my chroot dir.

I ran this to locate the DEPS

$ ldd lib/vsv.so 
lib/vsv.so:
        Start            End              Type  Open Ref GrpRef Name
        00000f929a569000 00000f929a572000 dlib  1    0   0      /home/rbucker/src/bppdash/lib/vsv.so
        00000f9323bc1000 00000f9323bf1000 rlib  0    1   0      /usr/lib/libm.so.10.1

I had to create 'usr/lib' and then copy '/usr/lib/libm.so.10.1' to 'usr/lib'.

then I was able to chroot and then load the extension.

2021-01-29
20:41 Reply: create unique index and order by not really unique or ordered properly (artifact: cf828b9537 user: rbucker)

THANKS! I did not see that coming. But makes perfect sense.

16:46 Reply: create unique index and order by not really unique or ordered properly (artifact: 206eefb9f2 user: rbucker)

Here is the info you requested

sqlite> PRAGMA table_info(Answers);
0|AnswerID||0||0
1|Answer||0||0
2|lasteditdate||0||0
3|lastedituser||0||0
4|status||0||0

typeof...

sqlite> select answerid, typeof(answerid) from answers;
10070|text
10090|text
10120|text
10140|text
10160|text
10440|text
10470|text
10540|text
11500|text
11630|text
11650|text
16:44 Reply: create unique index and order by not really unique or ordered properly (artifact: bd655c6dbd user: rbucker)

it has taken some time for this to show again...

sqlite> SELECT quote(answerid) FROM answers;
'10070'
'10090'
'10120'
'10140'
'10160'
'10440'
'10470'
'10540'
'11500'
'11630'
'11650'

and then there was this...

sqlite> select answerid, typeof(answerid) from answers;
10070|text
10090|text
10120|text
10140|text
10160|text
10440|text
10470|text
10540|text
11500|text
11630|text
11650|text

and yet this query still fails to return data

select * from answers where answerid in (10120,10160);

but then I tried this... and I got data.

select * from answers where cast(answerid as integer) in (10120,10160);
10120|http://bobville.com|2021-01-28 15:40:53|bob|Waiting for Review
10160|123-123-1234|2021-01-28 16:28:18|bob|Waiting for Review
2021-01-09
17:50 Post: create unique index and order by not really unique or ordered properly (artifact: 33835d68f7 user: rbucker)

My testing was in version 3.23...

Here are my tables:

CREATE TABLE Answers (AnswerID, Answer);
CREATE UNIQUE INDEX idx_answers on answers (AnswerID);

My application insert or replace rows in the DB. And then I started to notice some weird behavior only to determine that it was the side effects of the data. Looking at the data:

The order by failed here. The '-' is here to identify if there are embedded whitespace.

sqlite> select '-'||answerid||'-' from answers order by answerid;
-1030-
-1040-
-1060-
-1250-
-1270-
-1350-
-1030-
-1070-
-1120-

In the following SQL distinct did clean the duplicates but the order failed.

sqlite> select distinct '-'||answerid||'-' from answers order by answerid;
-1030-
-1040-
-1060-
-1250-
-1270-
-1350-
-1070-
-1120-

I'm drawing a blank on how to fix this or if it's encoding how to identify and repair or prevent.

2020-09-17
00:42 Post: clang compile errors sqlite3 3.33.0 autoconf with TEA (artifact: 18e7620c99 user: rbucker)

Starting at the beginning... (latest ChromeOS linux )

  • download sqlite autoconf with TEA (amalgamation)
  • decompress
  • go to the source
  • build the source CC=cc ./configure
  • make
  • make install

This is where things go sideways - cd tea - ./configure --prefix=/usr --with-tcl=/usr/local/lib - make

The make would not complete. There were a number of errors... something about CONST was not defined. As a guess I tried:

  • CFLAGS="-DCONST=const" ./configure --prefix=/usr --with-tcl=/usr/local/lib
  • make

and now I have this error...

./generic/tclsqlite3.c:376:3: error: use of undeclared identifier 'TCL_CHANNEL_VERSION_2'
  TCL_CHANNEL_VERSION_2,             /* version                              */
  ^
1 error generated.
make: *** [Makefile:289: tclsqlite3.o] Error 1

A search of the TCL source (9.0) suggests that this definition has been removed. I did not try very hard but the one reference that did pop was back in version 8.3.

2020-08-31
15:26 Reply: detecting clumps (artifact: 1edad0a3c7 user: rbucker)

I did some interesting testing...

-- TESTING
delete from centers; insert into centers values (1,10),(2,50),(3,75),(4,100);
delete from centers; insert into centers values (1,100),(2,75),(3,50),(4,10);
delete from centers; insert into centers values (1,100),(2,50),(3,75),(4,10);
delete from centers; insert into centers values (1,10),(2,20),(3,30),(4,40);
delete from centers; insert into centers values (1,1),(2,2),(3,3),(4,4);

In the last example where the centroids were 1,2,3,4 it only took 6 iterations to settle down.

14:21 Reply: detecting clumps (artifact: 97d4900883 user: rbucker)

I was looking for a SQL-only solution. I currently do not have enough extension building experience to pick the right solution. I did find a link to a related example from MS but left that alone for now. I have a small working example and it seems to work... with only two test cases:

drop table if exists sample;
create table sample(name,val);
insert into sample values ('ten', 10);
insert into sample values ('fifty', 50);
insert into sample values ('fifty', 50);
insert into sample values ('fifty', 50);
insert into sample values ('fifty', 50);
insert into sample values ('fifty', 50);
insert into sample values ('hundred', 100);
insert into sample values ('hundred', 101);
insert into sample values ('hundred', 102);
insert into sample values ('hundred', 103);
insert into sample values ('hundred', 150);
insert into sample values ('2hundred', 200);
insert into sample values ('2hundred', 225);
insert into sample values ('2hundred', 250);
insert into sample values ('3hundred', 300);
select * from sample;
select '';


-- k-mean
-- The algorithm works as follow:
-- Step 1: Choose groups in the feature plan randomly
-- Step 2: Minimize the distance between the cluster center and the different observations (centroid). It results in groups with observations
-- Step 3: Shift the initial centroid to the mean of the coordinates within a group.
-- Step 4: Minimize the distance according to the new centroids. New boundaries are created. Thus, observations will move from one group to another
-- Repeat until no observation changes groups


-- Step 1: Choose groups in the feature plan randomly
drop table if exists centers;
create table centers (k, center);
insert into centers (k,center)
select k, avg(val) as center from (select ntile(4) over r as k, val from sample window r as (order by val)) group by k ;
select * from centers order by k;


-- Step 2: Minimize the distance between the cluster center and the different observations (centroid). It results in groups with observations
-- Step 3: Shift the initial centroid to the mean of the coordinates within a group.
-- Step 4: Minimize the distance according to the new centroids. New boundaries are created. Thus, observations will move from one group to another
update centers as ce set center=n.center 
from (
        select k, avg(val) center 
        from (
                select * 
                from (
                        select c.k, s.val, abs(s.val-c.center) d, row_number() over r as rowno 
                        from sample s, centers c 
                        window r as (partition by s.val order by abs(s.val-c.center) )
                ) as x 
                where rowno=1
        ) as y
        group by k
) as n 
where ce.k=n.k;

select * from centers order by k;

-- print the results
select k, min(val) minval, max(val) maxval 
from (  
        select c.k, s.val, abs(s.val-c.center) d, row_number() over r as rowno
        from sample s, centers c 
        window r as (partition by s.val order by abs(s.val-c.center) )
) as x
where rowno=1
group by k
order by 1;

In step 1 above I used ntile() to select the "random" centroids. I updated the centers table with:

delete from centers;
insert into centers values (1,10),(2,50),(3,75),(4,100);

And after repeating steps 2-4 I go to the same set from the first run.

[a] this could be converted to a recursive CTE

[b] k was just a guess but maybe it should be slightly more deterministic

[c] and it needs more testing

thanks again

11:41 Reply: detecting clumps (artifact: e0343ccda6 user: rbucker)

That first sentence reads exactly as I imagine it would. Awesome!

The basic definition I had was based on a single interaction 35 or 36 years ago.

A more modern example would be something like an amazon product search by price where the price range(s) are in the sidebar. Something had to decide what the ranges are.

Thanks again.

2020-08-30
22:35 Post: detecting clumps (artifact: 87d9b55aab user: rbucker)

Most teachers grade on the traditional grading scale... 90-100 is an A, 80-89 is a B... and so on.

There are variations in grading which "they" call the curve... in one such mechanism it's called "clumping" or "clumping edge detection". Sadly I've only found the one reference and there is nothing about SQL...

It seems to me that there should be something ... In my case I'm doing grades but it would be quite useful in processing logfiles comparing the clumps using dates.

Thanks.

(hope I'm making sense)

2020-08-21
13:54 Reply: feature request for CSV extension... (artifact: 2e34dc0030 user: rbucker)

I think I found the problem... I needed to add the "columns" number to the create table. :(

13:49 Reply: feature request for CSV extension... (artifact: 7f324b0cf2 user: rbucker)

just a reply to confirm that the OOM error is coming from sqlite when I tried to access the virtual table.

sqlite> CREATE VIRTUAL TABLE temp.zipcode 
  USING vsv(filename='files/US.tsv', 
  header=no, 
  fsep='\t', 
  schema='create table x(country_code,postal_code,place_name,admin_name1,admin_code1,admin_name2,admin_code2,admin_name3,admin_code3,latitude,longitude,accuracy)');

sqlite> select count(*) cnt from temp.zipcode;
Error: out of memory

sqlite> select count(*) cnt from zipcode;
Error: out of memory
13:43 Reply: feature request for CSV extension... (artifact: b9a1dd3a40 user: rbucker)

I made all those same changes to get it to compile ... so thanks. I'll have to test is from the sqlite CLI to make sure it works... I am trying to link vsv to my golang project but getting an 'out of memory' error and I'm not sure where it's triggering yet.

Thanks again.

2020-08-20
12:48 Reply: feature request for CSV extension... (artifact: 4eb6ace8ad user: rbucker)

FYI... one of the warnings was strnicmp. Given that one of the params was a constant string... sqlite3_stricmp() will work great.

Thanks again

12:30 Reply: feature request for CSV extension... (artifact: 7aa05ccfc8 user: rbucker)

That's awesome... however, I tried to compile it on freebsd 12.1 with the following...

cc -g -fPIC -I /usr/local/include -shared sqliteext/vsc.c -o lib/vsc.so

and received all manner of C99 warnings. (just 6 warnings).

Thanks again.

PS... in the header you list the params but omitted the schema which you included in the defaults.

2020-08-19
21:34 Post: feature request for CSV extension... (artifact: eb392624da user: rbucker)

I spend a lot of time writing reports from flat datafiles. One thing about the csv.c extension is that it's FAST even when creating a table from the virtual table. My challenge is that I often create TSV in addition to CSV. I've done the batch sed/awk thing in the past but it seems to me that either forking the csv.c for a tsv.c or adding a sep="" param would make more sense.

As I'm not qualified to write SQLite internals I was hoping for some guidance. Forking seems to add a lot of code where a param requires more test cases... and I may not be prepared for that either.

All that said, some guidance and then we'll see what I can do.

regards