SQLite Forum

feature request for CSV extension...
Login
Take a look at the following modification that I made which embodies several changes from the default csv.c extension module.

<http://www.dessus.com/files/vsv.c>

```
/*
** 2020-02-08 modified by Keith Medcalf who also disclaims all copyright
** on the modifications and hereby places this code in the public domain
**
** This file contains the implementation of an SQLite virtual table for
** reading VSV (Variably Separated Values), which are like CSV files,
** but subtly different.  VSV supports a number of extensions to the
** CSV format as well as more processing options.
**
**
** Usage:
**
**  create virtual table temp.vsv using vsv(...);
**  select * from vsv;
**
** The parameters to the vsv module (the vsv(...) part) are as follows:
**
**  filename=STRING     the filename, passed to the Operating System
**  data=STRING         alternative data
**  columns=N           columns parsed from the VSV file
**  header=BOOL         whether or not a header row is present
**  skip=N              number of leading data rows to skip
**  rsep=STRING         record separator
**  fsep=STRING         field separator
**  validatetext=BOOL   validate UTF-8 encoding of text fields
**  affinity=AFFINITY   affinity to apply to each returned value
**  nulls=BOOL          empty fields are returned as NULL
**
**
** Defaults:
**
**  filename / data     nothing.  You must provide one or the other
**                      it is an error to provide both or neither
**  schema              nothing.  If not provided then one will be
**                      generated for you from the header, or if no
**                      header is available then autogenerated using
**                      field names manufactured as cX where X is the
**                      column number
**  columns             nothing.  If not specified then the number of
**                      columns is determined by counting the fields
**                      in the first record of the VSV file (which
**                      will be the header row if header is specified),
**                      the number of columns is not parsed from the
**                      schema even if one is provided
**  header=no           no header row in the VSV file
**  skip=0              do not skip any data rows in the VSV file
**  fsep=','            default field separator is a comma
**  rsep='\n'           default record separator is a newline
**  validatetext=no     do not validate text field encoding
**  affinity=none       do not apply affinity to each returned value
**  nulls=off           empty fields returned as zero-length
**
**
** Parameter types:
**
**  STRING              means a quoted string
**  N                   means a whole number not containing a sign
**  BOOL                means something that evaluates as true or false
**                          it is case insensitive
**                          yes, no, true, false, 1, 0
**  AFFINITY            means an SQLite3 type specification
**                          it is case insensitive
**                          none, blob, text, integer, real, numeric
**
** STRING means a quoted string.  The quote character may be either
** a single quote or a double quote.  Two quote characters in a row
** will be replaced with a single quote character.  STRINGS do not
** need to be quoted if it is obvious where they begin and end
** (that is, they do not contain a comma).  Leading and trailing
** spaces will be trimmed from unquoted strings.
**
**    filename =./this/filename.here, ...
**    filename =./this/filename.here , ...
**    filename = ./this/filename.here, ...
**    filename = ./this/filename.here , ...
**    filename = './this/filename.here', ...
**    filename = "./this/filename.here", ...
**
**  are all equivalent.
**
** BOOL defaults to true so the following specifications are all the
** same:
**
**  header = true
**  header = yes
**  header = 1
**  header
**
**
** Specific Parameters:
**
** The platform/compiler/OS fopen call is responsible for interpreting
** the filename.  It may contain anything recognized by the OS.
**
** The separator string containing exactly one character, or a valid
** escape sequence.  Recognized escape sequences are:
**
**  \t                  horizontal tab, ascii character 9 (0x09)
**  \n                  linefeed, ascii character 10 (0x0a)
**  \v                  vertical tab, ascii character 11 (0x0b)
**  \f                  form feed, ascii character 12 (0x0c)
**  \xhh                specific byte where hh is hexadecimal
**
** The validatetext setting will cause the validity of the field
** encoding (not its contents) to be verified.  It effects how
** fields that are supposed to contain text will be returned to
** the SQLite3 library in order to prevent invalid utf8 data from
** being stored or processed as if it were valid utf8 text.
**
** The nulls option will cause fields that do not contain anything
** to return NULL rather than an empty result.  Two separators
** side-by-each with no intervening characters at all will be
** returned as NULL if nulls is true and if nulls is false or
** the contents are explicity empty ("") then a 0 length blob
** (if affinity=blob) or 0 length text string.
**
** For the affinity setting, the following processing is applied to
** each value returned by the VSV virtual table:
**
**  none                no affinity is applied, all fields will be
**                      returned as text just like in the original
**                      csv module, embedded nulls will terminate
**                      the text.  if validatetext is in effect then
**                      an error will be thrown if the field does
**                      not contain validly encoded text or contains
**                      embedded nulls
**
**  blob                all fields will be returned as blobs
**                      validatetext has no effect
**
**  text                all fields will be returned as text just
**                      like in the original csv module, embedded
**                      nulls will terminate the text.
**                      if validatetext is in effect then a blob
**                      will be returned if the field does not
**                      contain validly encoded text or the field
**                      contains embedded nulls
**
**  integer             if the field data looks like an integer,
**                      (regex "^ *(\+|-)?\d+ *$"),
**                      then an integer will be returned as
**                      provided by the compiler and platform
**                      runtime strtoll function
**                      otherwise the field will be processed as
**                      text as defined above
**
**  real                if the field data looks like a number,
**                      (regex "^ *(\+|-)?(\d+\.?\d*|\d*\.?\d+)([eE](\+|-)?\d+)? *$")
**                      then a double will be returned as
**                      provided by the compiler and platform
**                      runtime strtold function otherwise the
**                      field will be processed as text as
**                      defined above
**
**  numeric             if the field looks like an integer
**                      (see integer above) that integer will be
**                      returned
**                      if the field looks like a number
**                      (see real above) then the number will
**                      returned as an integer if it has no
**                      fractional part and
**                      (a) your platform/compiler supports
**                      long double and the number will fit in
**                      a 64-bit integer; or,
**                      (b) your platform/compiler does not
**                      support long double (treats it as a double)
**                      then a 64-bit integer will only be returned
**                      if the value would fit in a 6-byte varint,
**                      otherwise a double will be returned
**
** The nulls option will cause fields that do not contain anything
** to return NULL rather than an empty result.  Two separators
** side-by-each with no intervening characters at all will be
** returned as NULL if nulls is true and if nulls is false or
** the contents are explicity empty ("") then a 0 length blob
** (if affinity=blob) or 0 length text string will be returned.
**
*/
```

A set of various extensions including vsv.c can be found in  
<http://www.dessus.com/files/sqlite3extensions.zip>