Integer becomes blob

(1.1) By extreme4all on 2020-04-23 16:06:15 edited from 1.0 [link] [source]

Hello,
recently I moved all my data to an SQLite database, it is about 17million records and at a random record instead of the integer columns being integer, they become binary blobs how can I fix this?
https://imgur.com/VJaJxsq

(2) By Larry Brasfield (LarryBrasfield) on 2020-04-23 16:25:17 in reply to 1.1 [link] [source]

The image you link shows why you believe that some values in a column were rendered from a BLOB result. It provides no hint whatsoever as to what might have led to this. Nor does it establish that SQLite actually returned BLOB values in the corresponding rows.

I can assure you that SQLite does not randomly return BLOBs when integers have been stored in a column with INTEGER affinity, nor does it do so after any number of rows were returned as stored.

So, to give anybody a clue as to what is happening before the end result you dislike, you are going to have to show what SQLite operations led to whatever application you screenshot displaying what you (and now we) can see.

(5.1) By extreme4all on 2020-04-24 18:25:00 edited from 5.0 in reply to 2 [link] [source]

alll python code: https://colab.research.google.com/drive/1ZlNpPKK0ivnxaOclFe1RPEwelnCFrBDx

smaller dataset required: https://drive.google.com/file/d/15yceflA_DjBn8LmB-uQLIPeNcy-s842m/view?usp=sharing

(15) By Larry Brasfield (LarryBrasfield) on 2020-04-25 16:30:26 in reply to 5.1 [link] [source]

It is quite an ask to suggest others should go paw through that code to see where you messed up. You need to reduce the problem yourself, until you have isolated a small subset of the operations invoked by all that code, where the subset's input-to-output transformation escapes your understanding. This is basic troubleshooting or debugging. If you intend to write much code, (and it appears you do), you need to learn the divide-and-conquer strategy. Running the code under a debugger, where you can examine variable values going into an operation and the values coming out of it, is one, excellent way to do this.

(3) By Kees Nuyt (knu) on 2020-04-23 16:30:23 in reply to 1.1 [link] [source]

It is hard to tell at first glance what could have gone wrong.

Please show us the schema for the problem table and the insert statement you used to fill it.

Is there just one row with that problem? You can verify with something like:

SELECT .... WHERE typeof() == 'blob' ....

Can you spot any relation between the problem row and its source input line?

You can verify the integrity of the physical database file with :

sqlite3 /path/to/database.sqlite "PRAGMA integrity_check"

-- 
Regards,
Kees Nuyt

(4.3) By extreme4all on 2020-04-24 18:27:59 edited from 4.2 in reply to 3 [link] [source]

all python code:

https://colab.research.google.com/drive/1ZlNpPKK0ivnxaOclFe1RPEwelnCFrBDx

additionional file used:

https://drive.google.com/file/d/15yceflA_DjBn8LmB-uQLIPeNcy-s842m/view?usp=sharing

Database creation: CREATE TABLE IF NOT EXISTS rsbuddy ( id integer PRIMARY KEY, item_name text, item_id int, ts int, overallPrice int, overallQuantity int, buyingPrice int, buyingQuantity int, sellingPrice int, sellingQuantity int ); Create unique index rsbuddy_ts_itemid on rsbuddy (ts,item_id);

to insert rows:

python:

cols = df.columns.to_list()

values = ','.join('?' for _ in cols)

cols = '","'.join(str(v) for v in cols)

with conn:

records = df.to_records(index=False)

sql = f'INSERT OR IGNORE INTO {table_name}("{cols}") values ({values})'

print(sql)

conn.executemany(sql, records)

it generates: INSERT OR IGNORE INTO rsbuddy("item_id","ts","buyingPrice","buyingQuantity","sellingPrice","sellingQuantity","overallQuantity","overallPrice","item_name") values (?,?,?,?,?,?,?,?,?)

pragma check returned ok

(6) By Kees Nuyt (knu) on 2020-04-24 20:01:14 in reply to 4.3 [link] [source]

My python knowledge is limited and it is the first time I see colab, so I will not pretend I am able to spot any errors or misuse.

The SQL looks reasonable but your schema and the datafile contain "item_name", the insert statement doesn't.

Somewhere the input values is bound to the statement as a blob. The code to try to convert back to integer is in "reading+ cleaning".

-- 
Regards,
Kees Nuyt

(8) By extreme4all on 2020-04-25 13:15:42 in reply to 6 [link] [source]

i dont see where item_name missing in my statements, the order is diffrent but that shoudnt matter right?

i made a function to convert back but the problem is that if i want just one item i cant select * from rsbuddy where item_id=2;

because it is not stored as 2 but as binary array...

(10) By TripeHound on 2020-04-25 13:46:34 in reply to 8 [link] [source]

The "missing" item_name is probably because "item_name text, item_id int" (where I've used backticks to enclose "code") got rendered as " item_name text, item_id int" (plain-text where underscore means "italic") in your post.

(11) By Kees Nuyt (knu) on 2020-04-25 14:26:17 in reply to 8 [link] [source]

i dont see where item_name missing in my statements, the order is diffrent but that shoudnt matter right?

Sorry, I overlooked that.

-- 
Regards,
Kees Nuyt

(12) By Kees Nuyt (knu) on 2020-04-25 14:53:36 in reply to 4.3 [link] [source]

Your CREATE TABLE statement uses column type int instead of the prescribed INTEGER keyword.

Python might not understand what you mean.

Try to change:

CREATE TABLE IF NOT EXISTS rsbuddy (
                                        id integer PRIMARY KEY,
                                        item_name text,
                                        item_id int,
                                        ts int,

etcetera, into:

CREATE TABLE IF NOT EXISTS rsbuddy (
                                        id INTEGER PRIMARY KEY,
                                        item_name TEXT,
                                        item_id INTEGER,
                                        ts INTEGER,

etcetera.

I hope this helps

-- 
Regards,
Kees Nuyt

(13) By Larry Brasfield (LarryBrasfield) on 2020-04-25 15:53:41 in reply to 12 [link] [source]

As can be seen in the doc on data-types, section 3.1, INT and INTEGER in the putative [a] type position of a create table statement mean the same thing. Also, SQLite is agnostic as the case of keywords, so the above two code fragments are equivalent as far as the parser is concerned.

[a. I say "putative" because dark-clumps intended to be interpreted as a typename can become something else to the parser. That is because the typename is optional for a column definition in SQLite.) ]

(17) By Kees Nuyt (knu) on 2020-04-25 21:50:26 in reply to 13 [link] [source]

As can be seen in the doc on data-types, section 3.1, INT and INTEGER in the putative [a] type position of a create table statement mean the same thing. Also, SQLite is agnostic as the case of keywords, so the above two code fragments are equivalent as far as the parser is concerned.

For SQLite, that's all true. My hunch was that Python might want to interpret the column type to determine which binding to use. Python might be less forgiving than SQLite.

-- 
Regards,
Kees Nuyt

(19) By Larry Brasfield (LarryBrasfield) on 2020-04-25 22:35:56 in reply to 17 [link] [source]

That's a fair point. Stepping back a bit, the problem was the vast amount of unseen code at work with no observation of intermediate results. Just about anything can happen in that great unknown, so your conjecture was not unreasonable given what we could see.

I got frustrated enough watching this thread that I decided another approach was needed. Instead of poking at the huge black box whose output the OP disliked, it needed to be broken apart to at allow him (or us) to see what flows between the pieces. Then, figure out what (black) piece is disappointing and break that down, recursing as needed. I felt that poking at the whole thing was too easy and likely too often done by the OP, who needed some better ways of tackling this problem and others like it.

My semi-blunt response to your suggestion sprung, in part, from desire to head off the approach of guessing and trying things in favor of digging in to figure out what is going wrong and where in a more deterministic manner. Please take it as meaning nothing more than that (and stating some SQLite facts, of course.)

Cheers,

Larry

(20) By Kees Nuyt (knu) on 2020-04-25 23:34:49 in reply to 19 [link] [source]

Please take it as meaning nothing more than that (and stating some SQLite facts, of course.)

I got your point, no offense taken!

Thanks for motivating the OP to investigate some more.

-- 
Regards,
Kees Nuyt

(21.1) By Keith Medcalf (kmedcalf) on 2020-04-26 03:04:19 edited from 21.0 in reply to 17 [link] [source]

The binding is based on the type of the python object that is passed into the wrapper (both for sqlite3 and apsw).

If that object is a Python int or long that will fit into a 32-bit signed integer, then it is converted to a C 32-bit signed integer and that integer is used in the sqlite3_bind_integer call,

Elseif that object is a Python int or long that is bigger than will fit in a 32-bit signed integer but will fit into a 64-bit signed integer, then it is converted to a C 64-bit signed integer and that integer is used in the sqlite3_bind_int64 call,

Elseif that object is a Python float, then it is converted to a double precision IEEE-754 floating point number and used in the sqlite3_bind_double call,

Elseif that object is str or unicode, then it is converted to a utf-8 C style string and used in the sqlite3_bind_text call,

Elseif that object is None, then the sqlite3_bind_null call is used,

Elseif that object supports the buffer protocol or is a byte string or bytes array, then the sqlite3_bind_blob call is used,

Else an error is thrown.

The "column affinity" is irrelevant.

For retrieving columns a similar strategy is used.

If the type is SQLITE_NONE, then the Python None object is returned,
Elseif the type is SQLITE_TEXT, then a Python unicode string is returned,
Elseif the type is SQLITE_INTEGER, then a Python integer (or long) is returned,
Elseif the type is SQLITE_FLOAT, then a Python float is returned,
Elseif the type is SQLITE_BLOB, then a Python buffer/byte string is returned.

The declared "column affinity" is irrelevant.

The only case wherein the "column type" is of any significance is that you may tell the sqlite3 wrapper to "preprocess" inputs and "postprocess" outputs of the above process based on "character sequences" in the "column type" or the "column name". That is, you could specify a "column type" of "datetime" which could, for example, cause the preprocessing objects to be stored into this field using some function -- for example to convert python datetime objects into iso time strings, and similarly to attempt to construct Python datetime objects from the data retrieved in that column. In all cases the data retrieval and storage is independant of such trivialities -- they merely specify pre and post processing.,

The problem was that when some of the arguments were asked "What are you?" they replied "I am a buffer (basket of bytes)", so they were stored as BLOBs.

(7) By Keith Medcalf (kmedcalf) on 2020-04-24 22:02:03 in reply to 1.1 [link] [source]

Find a row that has a blob that you think ought to be an integer.
Find this very same row in your source data that is getting sent to the INSERT statement.
Examine the "thing" that you think should be an "integer" and see what it is.
Thus will your problem be revealed to you. (Hint -- what you are sending is not an integer)

(9.1) By extreme4all on 2020-04-25 13:28:43 edited from 9.0 in reply to 7 [link] [source]

checking columns:

item_id datatype: int64

ts datatype: int64

overallPrice datatype: int64

overallQuantity datatype: int64

buyingPrice datatype: int64

buyingQuantity datatype: int64

sellingPrice datatype: int64

sellingQuantity datatype: int64

last 10 records send:

('Twisted trousers (t1)', 24409, 1587486, 350000, 1, 350000, 1, 0, 0)

('Inquisitors mace', 24417, 1587481, 475116000, 1, 0, 0, 475116000, 1)

('Nightmare staff', 24422, 1587486, 47120001, 5, 47133000, 3, 47100502, 2)

('Nightmare staff', 24422, 1587488, 47200000, 1, 0, 0, 47200000, 1)

('Nightmare staff', 24422, 1587490, 47399332, 3, 47210009, 1, 47493994, 2)

('Nightmare staff', 24422, 1587493, 47215007, 1, 47215007, 1, 0, 0)

('Twisted blueprints', 24463, 1587488, 16034999, 1, 16034999, 1, 0, 0)

('Twisted horns', 24466, 1587484, 13740000, 1, 13740000, 1, 0, 0)

('Eldritch orb', 24517, 1587493, 116598000, 1, 116598000, 1, 0, 0)

sql command INSERT OR IGNORE INTO rsbuddy("item_name","item_id","ts","overallPrice","overallQuantity","buyingPrice","buyingQuantity","sellingPrice","sellingQuantity") values (?,?,?,?,?,?,?,?,?)

the data that i get returned:

1154998 Nightmare staff b'f_x00x00x00x00x00x00' b'"9x18x00x00x00x00x00' b'xa4Axd3x02x00x00x00x00' b'x03x00x00x00x00x00x00x00' b'x19^xd0x02x00x00x00x00' b'x01x00x00x00x00x00x00x00' b'jxb3xd4x02x00x00x00x00' b'x02x00x00x00x00x00x00x00'

1154999 Nightmare staff b'f_x00x00x00x00x00x00' b'%9x18x00x00x00x00x00' b'x9fqxd0x02x00x00x00x00' b'x01x00x00x00x00x00x00x00' b'x9fqxd0x02x00x00x00x00' b'x01x00x00x00x00x00x00x00' b'x00x00x00x00x00x00x00x00' b'x00x00x00x00x00x00x00x00'

1155000 Twisted blueprints b'x8f_x00x00x00x00x00x00' b' 9x18x00x00x00x00x00' b'xb7xacxf4x00x00x00x00x00' b'x01x00x00x00x00x00x00x00' b'xb7xacxf4x00x00x00x00x00' b'x01x00x00x00x00x00x00x00' b'x00x00x00x00x00x00x00x00' b'x00x00x00x00x00x00x00x00'

1155001 Twisted horns b'x92_x00x00x00x00x00x00' b'x1c9x18x00x00x00x00x00' b'xe0xa7xd1x00x00x00x00x00' b'x01x00x00x00x00x00x00x00' b'xe0xa7xd1x00x00x00x00x00' b'x01x00x00x00x00x00x00x00' b'x00x00x00x00x00x00x00x00' b'x00x00x00x00x00x00x00x00'

1155002 Eldritch orb b'xc5_x00x00x00x00x00x00' b'%9x18x00x00x00x00x00' b'xf0$xf3x06x00x00x00x00' b'x01x00x00x00x00x00x00x00' b'xf0$xf3x06x00x00x00x00' b'x01x00x00x00x00x00x00x00' b'x00x00x00x00x00x00x00x00' b'x00x00x00x00x00x00x00x00'

this python code converts it back to integers just fine:

lambda x: int.from_bytes(x,byteorder='little')

which returns:

1154998 Nightmare staff 24422 1587490 47399332 3 47210009 1 47493994 2

1154999 Nightmare staff 24422 1587493 47215007 1 47215007 1 0 0

1155000 Twisted blueprints 24463 1587488 16034999 1 16034999 1 0 0

1155001 Twisted horns 24466 1587484 13740000 1 13740000 1 0 0

1155002 Eldritch orb 24517 1587493 116598000 1 116598000 1 0 0

because the data is not stored as integer I cannot query it.

i would love to select * from rsbuddy where item_id=24422;

(14.1) By Larry Brasfield (LarryBrasfield) on 2020-04-25 16:21:12 edited from 14.0 in reply to 9.1 [link] [source]

I think Keith's recommendation is very appropriate for your problem. In the hope that you will follow it, here is an interpretation and procedure for following it:

Find a row that has a blob that you think ought to be an integer.

Examine your results, at the level you originally saw and posted, to find one which was what you wanted or expected, and to find another which led to your original post and the title of this thread. Note the item_id value for these two, which I refer to below as LikedId and WoopsId. Note one column name which had the disappointing results, referred to below as HardColumn.

Find this very same row in your source data that is getting sent to the INSERT statement.

Make a copy of your DB to fool around with. Open the copy using the SQLite shell.

At that shell's prompt, enter:

  delete from rsbuddy where not (item_id in (LikedId, WoopsId));
  select item_id, typeof(HardColumn) as stored_type from rsbuddy;
  .exit

If the query result shows that the stored type is different between the two result rows, ("integer" in one case and "blob" in the other), concentrate on your insert statements. This means showing them here (if you need more help), including the binding operations that associate values with those '?' tokens.

Examine the "thing" that you think should be an "integer" and see what it is.

(Combined with above shell operations.) The shell should be your go-to tool for seeing what is actually in the database.

Thus will your problem be revealed to you. (Hint -- what you are sending is not an integer)

Revelation may be a bit more difficultly achieved than said. However, you have a smaller problem to deal with when you know whether or not its genesis is in the storing or the retrieving.

(16) By extreme4all on 2020-04-25 17:22:41 in reply to 14.1 [source]

manually executing this command gives me no problem and stores the data correct.

insert or ignore into rsbuddy ("item_name","item_id","ts","overallPrice","overallQuantity","buyingPrice","buyingQuantity","sellingPrice","sellingQuantity")VALUES('Twisted trousers (t1)', 24409, 1587490, 350000, 1, 350000, 1, 0, 0);

i think i have identified the issue, apparently providing a numpy record was the problem, it must have a numpy array (record is an array with datatypes)

(18) By Kees Nuyt (knu) on 2020-04-25 22:01:07 in reply to 16 [link] [source]

Congratulations.

(22) By Fankers on 2021-03-07 16:53:30 in reply to 16 [link] [source]

Since solution was not posted, I will post it, since I ran into the same problem today. Using Python & Pandas.

For some reason, Sqlite does not accept INT larger than 8 byte if it is coming from Numpy which Pandas uses.

Solution is, to use these two lines:

sqlite3.register_adapter(np.int32, int)

sqlite3.register_adapter(np.int64, int)

Also, not as great solution, but converting number to REAL before inserting it, would also work ;-)

p.s.

I found solution and explanation here:

https://stackoverflow.com/questions/49456158/integer-in-python-pandas-becomes-blob-binary-in-sqlite

(23.1) By Keith Medcalf (kmedcalf) on 2021-03-07 17:51:24 edited from 23.0 in reply to 22 [link] [source]

int32 is a 32-bit integer, as in 4 bytes
int64 is a 64-bit integer, as in 8 bytes

The "root cause" is that when the sqlite3 (pysqlite2) wrapper asks the python variable what "type" it is, it responds that the closest type it is to a supported type is a "byte array", "buffer", or "blob", so the byte-array is bound to the statement as a blob.

pysqlite2 (and apsw) only recognize Python base types None, int (long), float, str (unicode) or types that conform to the buffer protocol.

np.int32 and np.int64 are not python types that correspond to any python type, but do support the buffer protocol, so they are sent to the database as blobs.

Registering the adapters tells the pysqlite2 wrapper layer how to convert that type into a supported python type.

The same thing happens if you used a ctypes type. pysqlite2 does not know that, for example, ctypes.c_short can be converted to a python integer. It does however support the buffer protocol, so the bytes are stored as a blob. If you want to store them as an integer, you have to convert them to an integer first, or register an adapter that can convert the arbitrary type into a recognizable and supported base type.

>python
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MS
Type "help", "copyright", "credits" or "license" for more inf
>>> import ctypes
>>> a = ctypes.c_short(4)
>>> a
c_short(4)
>>> import sqlite3
>>> db = sqlite3.connect('')
>>> db.execute('create table x(x integer)')
<sqlite3.Cursor object at 0x00000210DBE75AB0>
>>> db.execute('insert into x values (?)', (a,))
<sqlite3.Cursor object at 0x00000210DBE759D0>
>>> for row in db.execute('select x from x'):
...  print(row)
...
(b'\x04\x00',)
>>> a.value
4
>>> sqlite3.register_adapter(ctypes.c_short, lambda x: x.value)
>>> db.execute('insert into x values (?)', (a,))
<sqlite3.Cursor object at 0x00000210DBE75AB0>
>>> for row in db.execute('select x from x'):
...  print(row)
...
(b'\x04\x00',)
(4,)
>>>

(24.1) By Keith Medcalf (kmedcalf) on 2021-03-07 18:39:31 edited from 24.0 in reply to 22 [link] [source]

Note that the sqlite3 (pysqlite2) wrapper allows you to "adapt" a non-base type for storage and "convert" it for retrieval. The "conversion" is based on either parsing the column declared type or the query column name (see the documentation).

However, the wrapper retrieves such values as byte strings even if they are of another type (INTEGER, REAL), so you have to handle that:

>python
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> import sqlite3
>>> a = ctypes.c_short(4)
>>> db = sqlite3.connect('', detect_types=sqlite3.PARSE_DECLTYPES)
>>> sqlite3.register_adapter(ctypes.c_short, lambda x: x.value)
>>> sqlite3.register_converter('short', lambda x: ctypes.c_short(int(x)))
>>> db.execute('create table x(x short integer)')
<sqlite3.Cursor object at 0x000001BE2C015AB0>
>>> db.execute('insert into x values (?)', (a,))
<sqlite3.Cursor object at 0x000001BE2C0159D0>
>>> for row in db.execute('select x from x'):
...  print(row)
...
(c_short(4),)
>>>

(25.1) By andydoc1 on 2021-11-04 14:44:04 edited from 25.0 in reply to 1.1 [link] [source]

I had a similar problem using python 3 and sqlite via apsw:

Python 3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AM
D64)] on win32  
Type "help", "copyright", "credits" or "license" for more information.  
>>> import apsw, datetime, MetaTrader5 as mt5  
>>> from contextlib import closing  
>>> conn=apsw.Connection("C:\\Users\\Administrator\\AppData\\Roaming\\MetaQuotes\\Terminal\\D0E8209F77C8CF37AD8BF550E51FF075\\MQL5\\tgsignals.sqlite")  
>>> conn.setbusytimeout(10000)  
>>> mt5.initialize()  
True  
>>> utc_to = datetime.datetime.now().replace(tzinfo=datetime.timezone.utc).timestamp()  
>>> utc_from = utc_to-300  
>>> ratesm5 = mt5.copy_rates_range("USDJPY", mt5.TIMEFRAME_M5, utc_from, utc_to)  
>>> curbar = ratesm5[0][0]  
>>> curbar  
1636024200  
>>> type(curbar)  
<class 'numpy.int64'>

inserting curbar into the 'time' field of table 'equity' led to the following unwanted result:

>>> with closing(conn.cursor()) as cursor:  
...     for row in cursor.execute("SELECT time, typeof(time) FROM equity"):  
...         print(row)  
...  
(b'\xa0\xd4\x83a\x00\x00\x00\x00', 'blob')  
(b'\xcc\xd5\x83a\x00\x00\x00\x00', 'blob')  
(b'\xf8\xd6\x83a\x00\x00\x00\x00', 'blob')

However the simple expedient of substituting curbar = ratesm5[0][0].item() *for *curbar = ratesm5[0][0] gave the desired output in the new lines:

>>> with closing(conn.cursor()) as cursor:  
...     for row in cursor.execute("SELECT time, typeof(time) FROM equity"):  
...         print(row)  
...  
(b'\xa0\xd4\x83a\x00\x00\x00\x00', 'blob')  
(b'\xcc\xd5\x83a\x00\x00\x00\x00', 'blob')  
(b'\xf8\xd6\x83a\x00\x00\x00\x00', 'blob')  
(1636030200, 'integer')  
(1636030500, 'integer')

Hope this helps someone...