SQLite Forum

Can an application-defined function be used to convert one row of data to multiple rows and insert them into another table?

Can an application-defined function be used to convert one row of data to multiple rows and insert them into another table?

(1.1) By Gary (1codedebugger) on 2021-08-29 03:40:09 edited from 1.0 [link]


I'm trying to use the SQLite Tcl API to transform rows that contain binary text of a few sentences into rows of words and write the rows to a new table, after removing any prefixed/suffixed punctuation.

The code below, of course, doesn't work; one reason at least being that the procedure returns only one value, the list, instead of three values. 

    set sql {insert into table_2 (id, index, word) select transfunc(id, textdata) from table_1}
    db function transfunc -argcount 2 -deterministic -directonly -returntype text {SplitToWords}

    proc SplitToWords {id str} {
     global punc
     set listw {}
     lmap word [encoding convertfrom utf-8 $str] {
      lappend $listw {[incr i] [string map $punc $word]}
     return $listw

    db eval $sql

If the procedure is changed to write each row from within it, as below, it works. But is this the correct approach?
Thank you for any guidance you may be able to provide.

    set sql {select transfunc(id, textdata) from table_1}

    proc SplitToWords {id str} {
     global punc
     lmap word [encoding convertfrom utf-8 $str] {
      incr i
      set p [string map $punc $word]
      set sql {insert into table_2 (id, index, word) values($id, $i, $p)}"
      db eval $sql

(2) By Larry Brasfield (larrybr) on 2021-08-29 13:28:23 in reply to 1.1

> If the procedure is changed to write each row from within it, as below, it works. But is this the correct approach?

That approach offends a value known as "<u>[Separation of Concerns](https://en.wikipedia.org/wiki/Separation_of_concerns)"</u>. You would need to rename the function to SplitToWordsAndStuffIntoSomeDB(), and it would still be doing disparate things.

A cleaner approach would be to get the word list, then iterate over it using TCL with each loop doing an insert. This might induce worry about efficiency, but the SQLite/TCL library caches some statements to avoid recompilation in loops such as you would write.

BTW, I empathize with your sentiment that it would be nice to express "produces row set" with a function. But as far as I know, there is no way to do it except programmatically as I suggest above.

(3.1) By Gary (1codedebugger) on 2021-08-30 03:24:14 edited from 3.0 in reply to 2 [link]

Thank you. What about trying to make use of the JSON1 extension's table-valued functions?
Like this, for example. It worked pretty well, as far as I can tell (the data is written and accurate); but I'm not a programmer and have nothing to which to compare it. I'm sure that's obvious from the code and question alone.

Does the optimization note under section 3.1 of the With Clause apply at all when it is not recursive? I mean are the first two tables completely built in memory before anything is inserted, or does SQLite process one row at a time from the table storing the existing data, inserting all of the resultant rows from it into the empty table, and then moves to the next row?

set sql {
with \
 iters as ( \
   select version as t, book_no as b, chapter_no as c, verse_no as v \
   from Bible_translations \
   where t=0 ),\
 jeach as ( \
   select it.t as t, it.b as b, it.c as c, it.v as v, \
          json_extract( jt.value, '$.j' ) as j, \
          json_extract( jt.value, '$.w' ) as w, \
          json_extract( jt.value, '$.p' ) as p \
   from iters it, json_each( (select transfunc(version, book_no, chapter_no, verse_no, text_content) \
                           from Bible_translations bt \
                           where bt.version=it.t \
                             and bt.book_no=it.b \
                             and bt.chapter_no=it.c \
                             and bt.verse_no=it.v)) jt) \
insert into concordance (version, book_no, chapter_no, verse_no, index_no, word_orig, word_clean) \
select * from jeach;}

dbt function transfunc -argcount 5 -deterministic -directonly -returntype text {jsonbuild}

proc jsonbuild { t b c v w } {
 global punc
 set u [encoding convertfrom utf-8 $w]
 set json [join lmap word [split $u] {
  incr j
  set p [string map $punc $word]
  set word "\"p[incr a]\":{\"t\":$t,\"b\":$b,\"c\":$c,\"v\":$v,\"j\":$j,\"w\":\"$word\",\"p\":\"$p\"}"
 }] ,]
 return \{$json\}

(4.1) By Gunter Hick (gunter_hick) on 2021-08-30 06:17:47 edited from 4.0 in reply to 1.1 [link]

We are using virtual tables for similar purposes since release 3.7.14. The mechanism has been expanded to allow table valued functions by declaring hidden columns in the internal table declaration passed back to SQLite, e.g.

CREATE TABLE split (word TEXT, list TEXT hidden)

which allows one to 

SELECT word FROM split WHERE list = 'list of words, with punctuation';

and also JOIN to other tables

(5) By ddevienne on 2021-08-30 07:38:25 in reply to 4.1 [link]

And [here's the doc you need](https://www.sqlite.org/vtab.html#tabfunc2) to get started.

(6) By Gary (1codedebugger) on 2021-08-31 02:27:25 in reply to 5 [link]

Thanks for the information and document link. At a high level, is it correct that the SQL function in my little example would be a method defined in a module, and which must be written in C? And there is no equivalent in the SQLite Tcl API?

I suppose one could write it in C and invoke it from Tcl as needed, and the method in my case is pretty simple. I readily admit that much of what I read is very new to me and I'd need to study and experiment for a couple weeks, likely, just to get a basic grasp of it; and I wanted first to make sure I understand where exactly this is to be done.  Thanks again.