Confused about blobs

(1) By Scott Doctor (scottdoctor) on 2020-05-30 08:09:12 [source]

I am confused about how to deal with blobs. Reading through the documentation I find references to operations like sqlite3_open_blob, sqlite3_write_blob,... and examples that use bind like sqlite3_bind_blob.

My question is which technique should I use, bind or open/write/read? the documentation about blobs seems to favor bind, but open write, read seems more intuitive.

What I am doing is writing a C program for work to replace our manual paper test reports with a sqlite database that has the same, and all related data, in one sqlite file. Some of the data is auto generated by a C program that analyzes the data. Most of the fields are normal integer, reals, text, of which I have no issues.

One part of the processing program generates some graphs of the test data. The graphs are converted into a PNG image which I want to put into a blob that is part of the record for that data set. Currently we have data and graphs spread out over dozens of folders. One machine can generate more than a hundred images and has become a nightmarish task to find stuff. Such is why I am trying to convert our manual system to a sqlite database file that contains all information for a specific machine.

I just cannot seem to get working with blobs to work. Using a browser such as DB Browser for sqlite, I can copy/paste an image into a blob field. But I want to do this programmatically (in my C program that created the data) without the need for one of the technicians to open a program and copy/paste.

So assume a C function being passed a pointer to an array of bytes that is the image I want to put into the current record blob field. The database is already open and is an external variable. Assume the db has a field called imageblob that I want to put the bytes of the image into.

sqlite3 *db;

int myfunction_writeblob(char *image, long bytes_in_image) { ... }

What is the recommended technique for putting the bytes into the blob (and the converse reading them out)?

One problem with the documentation is that there is so much information it is hard to tell what is important and what is not. Would be nice to have a brief how-to C example for doing basic blob handling in the documentation., or if it already exists, make it easier to find.

(2.1) By Stephan Beal (stephan) on 2020-05-30 09:34:05 edited from 2.0 in reply to 1 [link] [source]

My question is which technique should I use, bind or open/write/read? the documentation about blobs seems to favor bind, but open write, read seems more intuitive.

The open/write/read/close APIs provide the ability to incrementally write/read blob data (similar to a file handle) and are most useful for large blobs which you "really don't" want to hold entirely in client-side memory at once. The vast majority of the time, the sqlite3_bind_blob() API is easier, and sqlite3_column_blob() for reading.

Unless you know you have a need for the more detailed APIs, ignore them and stick with the bind/column APIs.

int myfunction_writeblob(char *image, long bytes_in_image) { ... }

Here's an example which uses the open/read/write/close API:

https://icculus.org/~chunky/stuff/sqlite3_example/sqlite3_example_blob.c

but that's the hard way to do it. The easier way is (pseudocode):

// write:
sqlite3_bind_blob(myStmt, paramIndex, blobMemPtr, lenOfBlobMemPtr, SQLITE_TRANSIENT );
// ^^^ noting that SQLITE_STATIC may be better (use less memory),
// depending on the context, but SQLITE_TRANSIENT is safer, in
// terms of potentially memory misuse, if you're uncertain.
sqlite3_step(myStmt);

// read:
void const * blob = sqlite3_column_blob(myStmt, columnIndex);
int const blobLen = sqlite3_column_byte(myStmt, columnIndex);

That's all there is to it. If you know how to bind and read text columns, you already know how to bind and read blob columns. Don't let the different data type name fool you into thinking otherwise.

(3) By Scott Doctor (scottdoctor) on 2020-05-30 09:50:07 in reply to 2.1 [link] [source]

What I do not understand is why bind exists at all.

Why is it not something as simple as:

INSERT INTO blobfield (pointer, bytes)

I do not understand the concept of bind. Never used it and do not get why this different mechanism from INSERT exists.

(4.1) Originally by anonymous with edits by Richard Hipp (drh) on 2020-05-30 10:23:40 from 4.0 in reply to 3 [link] [source]

https://cheatsheetseries.owasp.org/cheatsheets/Query_Parameterization_Cheat_Sheet.html

- Donald Shepherd

(5) By Keith Medcalf (kmedcalf) on 2020-05-30 10:16:09 in reply to 1 [link] [source]

The difference between "text" and "blob" is that "text" is a bag-of-bytes that is encoded using UTF-8 and terminated with a NULL (0) byte -- and that rather than explicitly specifying the length of the storage area, you can use "-1" which means "go look for the NULL (0) terminator and figure it out for yourself, Mr. SQLite" -- but a "blob" is just a bag-of-bytes with no null (0) terminator and you must explicitly pass the size-of-the-bag when binding a blob.

Similarly, sqlite3_column_text returns a pointer to a properly encoded NULL (0) terminated UTF-8 text string. sqlite3_column_blob returns a pointer to the "bag-o-bytes". A bag-o-bytes that is "text" will always have a NULL (0) terminator appended (it is not part of the text). A bag-o-bytes that is a "blob" will not have a NULL (0) terminator appended. In both cases sqlite3_column_bytes will tell you the size of the bag EXCLUDING the NULL (0) terminator which is appended to "text".

If "text" contains embedded nulls (0) bytes or otherwise contains invalid UTF-8 data hell might freeze over or you might cause your computer to explode since the definition of a "C UTF-8 Encoded Text String" is a sequence of valid UTF-8 encoded bytes followed by a NULL (0) terminator and a NULL (0) byte is not allowed to be embedded because that character "terminates" the string.

A "blob" is just a bag-of-bytes. SQLite3 imposes no structure on a blob, and makes no assumptions or reliance on the "properness" of the bytes -- it is merely a bag-o-bytes of a specified size.

The sqlite3_blob* interfaces are conceptually similar to the open/read/write/close calls for files because the bag-o-bytes that comprises a blob may contain more bytes than your computer contains memory. Also, the blob may be, for example, 1 TB is size but you only need to read the 47 bytes at offset 483754943 in the blob and not what comes before or after that, so there is no point in allocating a whole 1 TB memory block to read the whole thing when you only want a wee bit. Also, sqlite3_blob_write cannot "append" to a blob. It can only write to an already allocated blob.

(6.2) By Keith Medcalf (kmedcalf) on 2020-05-30 10:36:53 edited from 6.1 in reply to 3 [link] [source]

Oh, you want the format for encoding a blob in SQL Injection attack format.

That is specified as X'<hexbytes>...'

INSERT INTO t (integer, blob) values (57, X'4F8E72487B');

sqlite3_bind* is used to allow a program to talk to SQLite3. The above might be:

int i = 57;
unsigned char* b = {0x4F, 0x8E, 0x72, 0x48, 0x7B};
sqlite_stmt* stmt;

sqlite3_prepare_v2(db, "insert into t (integer, blob) values (?,?)",-1,&stmt, 0);
sqlite3_bind_int(stmt, 1, i);
sqlite3_bind_blob(stmt, 2, b, 5, SQLITE_TRANSIENT);
sqlite3_exec(stmt);
sqlite3_finalize(stmt);

where the ? are "positional parameters" and you are "binding" the integer i to the first parameter and the blob b (of 5 bytes in length) to the second parameter.

(7) By Scott Doctor (scottdoctor) on 2020-05-30 20:30:57 in reply to 6.2 [link] [source]

I think I get it now.

What I have been doing is using sprintf() to assemble a sql statement. So bind is sort of like doing a sprintf. A base string with variables that get replaced by some value using bind. So several calls to sqlite3 routines to build the statement sql.

Is there an advantage to using several bind calls versus using a single sprintf call? or is just another way of doing the same thing?

(8) By anonymous on 2020-05-30 21:04:32 in reply to 1 [link] [source]

One part of the processing program generates some graphs of the test data. The graphs are converted into a PNG image.

Keep the images in filesystem, store their path in db. Why bother to stuff it into blob, then unpack just to reference it as a file again?

(9) By Tim Streater (Clothears) on 2020-05-30 21:53:31 in reply to 7 [link] [source]

Not quite. If you build up a string yourself, using a user-supplied value abc, you might end up with, say:

update mytable set mystr='abc';

which is OK. But suppose the string the user supplies, instead of abc, is:

abc'; drop mytable;

Then you're in trouble. You avoid this by using a prepared statement like this:

update mytable set mystr=?;

Note the question-mark. Then you bind the user's value to that statement, and it doesn't matter what string the user gives you, it ALL goes into the database.

So it isn't just another way of doing the same thing.

(10) By Scott Doctor (scottdoctor) on 2020-05-30 22:23:53 in reply to 8 [link] [source]

"Keep the images in filesystem, store their path in db. Why bother to stuff it into blob, then unpack just to reference it as a file again?"

That defeats the purpose, and that is what we currently are doing. 

The idea is that ALL information for a specific machine is in a single file that can travel with the machine. The devices are required to have an extensive testing and documentation log in accordance with ISO 9002 and regulations from the US government and other regulators in various countries. 

It has become untenable gathering all the data for one machine, let alone trying to find anything, like looking for trends over many machines. By everything being in one database file per machine it becomes easy to search and extract various data, along with being able to store a history of the machine.

(11) By anonymous on 2020-05-30 22:52:00 in reply to 10 [link] [source]

Have you considered Fossil?

It's fitting the single-file requirement, history keeping, does blobbing for you, and has the presentation level already built-in.

Ah, it's also multiplatform, and lets you use filesystem so that heterogenous artifacts can be collected and organized prior to storing them into the repository.

(12) By Scott Doctor (scottdoctor) on 2020-05-30 23:07:57 in reply to 6.2 [link] [source]

This will probably get mangled by email systems line wrapping everything, but it looks like on the forum it keeps the format.

Below is the C code I wrote for basic access of my database. If you have any suggestions, or looks like I missed something, feel free to comment. There is a header file not included here that has an enumeration of error codes, but they are easily replaced and obvious.


/* ********************** */
/* SQLite Access routines */
/* ********************** */

#include <stdio.h>
#include <ctype.h>
#include <errno.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <mem.h>
#include <io.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

#include "sqlite3.h"

#define SQLMAXSIZE BUFSIZE /* max number of bytes in a sql statement */

typedef struct
  {
  char fn[MAXFILENAMESIZE + 1]; /* name of the database */
  char isopen;                  /* flag whether the database is open */
  char exists;                  /* flag set if database exists on disk */
  char sql[SQLMAXSIZE];         /* the sql string */
  int sqllen;                   /* length of the sql string */
  sqlite3 *db;                  /* the opaque sqlite database structure */
  sqlite3_stmt *stmt;           /* the prepared statement */
  int rows;                     /* number of rows returned by statement */
  } TXTDB;

TXTDB txtdb; /* the database */

char txtdbtemp[SQLMAXSIZE + 2]; /* temp string */

/* ********************************************************** */
/* open txtdb and init variables. called at start of program */
/* returns 0 if everything OK */
/* returns 1 if file did not exist and was created */
/* this means the database tables need to be created */
/* ********************************************************** */

int txtdb_open(void)
  {
  int i;
  int f;
  char cflag; /* create flag if set then file does not exist */

  if (txtdb.isopen)
    {
    write_errfile("Attempt to open txtdb which is already open");
    exit(TXTTLS_ERROR_31);
    }

  if (txtdb.db)
    {
    write_errfile("txtdb.db not null in txtdb_open");
    exit(TXTTLS_ERROR_31);
    }

  /* first try opening the file to see if it exists */

  cflag = 0;
  f = open(txtdb.fn, O_RDWR | O_BINARY, S_IREAD | S_IWRITE);
  if (f == -1)
    {
    /* if an error occurs then cannot proceed, so write error message */
    switch (errno)
      {
    case EACCES:
      write_errfile("txtdb Open Error: Permission Denied");
      exit(TXTTLS_ERROR_33);
    case EINVACC:
      write_errfile("txtdb Open Error: Invalid Access Code");
      exit(TXTTLS_ERROR_33);
    case EMFILE:
      write_errfile("txtdb Open Error: Too Many Files Open");
      exit(TXTTLS_ERROR_33);
    case ENOENT:
      write_errfile("txtdb file does not exist. Creating txtdb");
      cflag = 1; /* set flag indicating that file is being created */
      break;
    default:
      write_errfile("txtdb Open Error: Unknown Error");
      exit(TXTTLS_ERROR_33);
      }
    }
  close(f); /* close the handle */

  i = sqlite3_open(txtdb.fn, &txtdb.db); /* open the database */
  if (i != SQLITE_OK)
    {
    sprintf(txtdbtemp, "txtdb Open Error: error during database open %d", i);
    write_errfile(txtdbtemp);
    exit(TXTTLS_ERROR_33);
    }
  if (!txtdb.db)
    {
    write_errfile("txtdb Open Error: txtdb is null after open");
    exit(TXTTLS_ERROR_33);
    }

  txtdb.isopen = 1;     /* set open flag */
  txtdb.exists = cflag; /* save the create flag */
  return cflag;
  }

/* ************************************ */
/* Close the previously opened database */
/* ************************************ */

void txtdb_close(void)
  {
  int i;

  if (!txtdb.isopen)
    {
    if (txtdb.db)
      {
      write_errfile("txtdb Close Error: txtdb.db not null and not flagged as open");
      exit(TXTTLS_ERROR_34);
      }
    write_errfile("txtdb Close Error: txtdb.db is null and not flagged as open");
    exit(TXTTLS_ERROR_34);
    }

  do
    {
    i = sqlite3_close(txtdb.db);
    if ((i != SQLITE_BUSY) && (i != SQLITE_OK))
      {
      sprintf(txtdbtemp, "txtdb Close Error: error during database Close %d", i);
      write_errfile(txtdbtemp);
      exit(TXTTLS_ERROR_34);
      }
    }
  while (i == SQLITE_BUSY);

  txtdb.isopen = 0;
  txtdb.db = 0;
  txtdb.exists = 0;
  }

/* ***************************************************** */
/* does the sqlite3_prepare_v2() with error checking */
/* returns SQLITE_OK if ready for a step. */
/* writes to errfile, but does not exit if sql too long */
/* if sql is too long then returns SQLITE_ERROR */
/* any other error, the error code is returned */
/* **************************************************** */

int txtdb_prepare(char *sql)
  {
  int i;

  txtdb.sqllen = strlen(sql); /* get and check length of sql */
  if (txtdb.sqllen > SQLMAXSIZE - 1)
    {
    write_errfile("SQL Too Long:");
    write_errfile(sql);
    return SQLITE_ERROR;
    }
  strcpy(txtdb.sql, sql); /* save the sql into the structure */
  if (!txtdb.isopen)
    {
    write_errfile("txtdb prepare without database open");
    exit(TXTTLS_ERROR_35);
    }
  if (txtdb.db == 0)
    {
    write_errfile("txtdb prepare with null database handle");
    exit(TXTTLS_ERROR_36);
    }

  i = sqlite3_prepare_v2(txtdb.db, sql, (txtdb.sqllen + 1), &txtdb.stmt, 0);

  return i;
  }

(13) By Warren Young (wyoung) on 2020-05-30 23:08:59 in reply to 11 [link] [source]

Beware the repo size bloat problem, though. In this case, it might be better to store the raw data that produces the PNGs along with scripts to generate them from the raw data than to store the output PNGs.

(14) By Scott Doctor (scottdoctor) on 2020-05-30 23:17:29 in reply to 11 [link] [source]

Actually I advocated for months to use fossil. I even set up an example repository. But the powers-that-be decided no, mostly because they are not computer people. But they did like my example demonstration of using a database file instead. Such as why I am going this route. Realistically a modifiable and searchable database is actually what we need while generating the data.

The main issue in this discussion is that I have sets of data that get plotted to graphs. The graphs must go along with the data as part of the documentation package that follows along with the machine. A typical completed set of data and documents comprises about 100 megabytes of information.

(15) By anonymous on 2020-05-30 23:28:21 in reply to 13 [link] [source]

The way this is described by OP is really a nice consulting project to pitch, as there are quite a few ways to make this portable, extensible, compact, user-friendly and low-maintenace. Going C way, in my opinion, would be the least efficient choice.