SQLite User Forum

Can I take advantage of SQLite’s ”flexibly typed” ?
Login

Can I take advantage of SQLite's "flexibly typed" ?

(1) By Swastik (961679) on 2022-12-27 15:31:22 [link] [source]

I need to construct SQL query from json,

Sample Json:


{
"table": "employee",
"type": "update",
"data": {
      "id": 111,
      "name": "swastik",
       "age": 23
      }

"old": {
      "age": 22,
      }
}

In general SQL queries will have double/single quotes for column strings and no quotes if column is int/float.

So for constructing SQL queries for SQLite I've 2 ways

1. Taking advantage of SQLite's "flexibly typed" feature

By using this feature of SQLite I can have double quotes for all columns without checking whether column value is String or Int or Flot

Output Query

UPDATE employee SET age = "23" WHERE id = 111;

2. Use python inbuilt function.

Use python inbuilt function isinstance() to determine whether column value is String or Int or Float and add quotes only if column value is string

Output Query

UPDATE employee SET age = 23 WHERE id = 111;

Which one of the above approach is best?

(2) By Keith Medcalf (kmedcalf) on 2022-12-27 17:17:13 in reply to 1 [link] [source]

UPDATE employee SET age = "23" WHERE id = 111;

This is invalid SQL. If you mean for "23" to designate the text string '23', then you need to use single-quotes. Double-quotes are for identifiers.

I should think your JSON translates to:

update employee
   set age = 23
 where id = 111
   and name = 'swastik'
   and age = 22
;

otherwise the JSON contains useless information.

Why are you assuming that the creator of the JSON was a moron that included useless information?

(3) By Keith Medcalf (kmedcalf) on 2022-12-27 17:21:21 in reply to 2 [link] [source]

Of course, a logical implementation would convert that JSON to the following statement:

update employee
   set id = 111,
       name = 'swastik',
       age = 23
 where age = 22
;

but clearly the JSON was not designed by a logical thinking person.

(4) By anonymous on 2022-12-27 19:03:41 in reply to 1 [link] [source]

Please have my apologies for the reply from the other members response.

Having the db schema for the target query is useful in writing queries. Column definition, foreign keys, indexes etc all provide critical terrain for the query to navigate.

For example, if the table has a unique index on id, then one could use just the id to identify an individual record(stating the obvious).

The JSON you provide may have redundant/unnecessary info for many reasons, including providing you with all available info to not limit your implementation options, or the info is provided to more than you and a single canonical source for that info.

One of the things SQLITE gets right is they do not assume that use cases are limited to what they know about (as opposed to chrome who look at collected user metrics as the scripture of use cases).

As to you sample, Keith is quite right about the double quited string misfeature ( see https://www.sqlite.org/quirks.html ). While you may br able to get away with it, it is best to stick to standard SQL whenever possible.

Applying that same principle to your core question, my opinion is to the extent that you do not make your solution rely on sqlite's unique dynamic typing your application will be more easily adapted to another sql engine. The less subtle mental gymnastics one had to do the more they can focus on the essentials, and that will (IMO) lead to less problems.

Lastly, relying on quirks(even when well supported), premature optimization and tricks at the start is a poor omen. See Knuth art of computer programming pages 1-1000 :)(this is meant as humor but he is worth a read).

What is considered best is up to you, and only you. You are the one practicing your craft. You will know common practice, but also its limitations. you will test alternatives to explore the difference between theory, and practice. As yoggi Berra once said, "in theory, theory and practice are the same, in practice they ain't".

so, my advice: dump the double quoted literals (should bind those parameters anyway see sql injection expoloits), and write the query so it works well against the target db, both storing and retrieving. That is where your work is.

again, my apologies for what appears to me to be unecessary cruelty in another, highly talented sqlite forum member's response.

(5) By anonymous on 2022-12-27 19:24:52 in reply to 4 [link] [source]

for your option 2, in python you can use parameterized queries. one ref to this type of thing is here https://stackoverflow.com/questions/45343175/python-3-sqlite-parameterized-sql-query

note they also mention sql injection as a reason to parameterize your queries.

With very few exceptions, it is almost always better to parametrize your queries (no indication you are anywhere near those exceptions).

(6) By Keith Medcalf (kmedcalf) on 2022-12-27 19:48:12 in reply to 1 [link] [source]

Fixing your broken JSON, the example comes down to this very straightforward code:

import json

a = json.loads("""
{
"table": "employee",
"type": "update",
"data": {
      "id": 111,
      "name": "swastik",
       "age": 23
      },
"old": {
      "age": 22
      }
}
""")

sql = f'''{a['type']} {a['table']} set ''' + ', '.join(f'{k} = ?' for k in a['data'].keys()) + ' where ' + ' and '.join(f'{k} = ?' for k in a['old'].keys())
param = list(a['data'].values()) + list(a['old'].values())

print(sql)
print(param)

# db.execute(sql, params)

where you get the following for sql and param:

>>> print(sql)
update employee set id = ?, name = ?, age = ? where age = ?
>>> print(param)
[111, 'swastik', 23, 22]
>>>

All of which requires nothing special whatsoever (note that it is perspicacious, no matter what the bleeding hearts will tell you) to design your data inputs appropriately. If you do not, then you will have massive difficulties and introduce massive complication (and failure modes) where they need not exist.

(7) By anonymous on 2022-12-27 23:45:56 in reply to 6 [link] [source]

That certainly handles their original sql injection pitfall and takes them through their stated 'problem' (which presents as an xy type).

While it's reasonable to presume that "update" translates to the sql UPDATE, it gets a bit dicey if the JSON "Type" has values such as "PUT" as well as "UPDATE". And of course the OP does not specify one way or the other.

Agree 100% the inputs should have more attention. Indeed, IMO they NEED more attention by the OP otherwise they most likely will have quite a few surprises.

And technically the sql does "update" all employees that have an age =22 to the 'data', it's seems unlikely that the JSON is intended to do that. It is of course possible that they want everyone aged 22 to all be 23 at the same time (while changing their names and id all to one name is like writing not in python, but monty python - 'Bruce' :). Thus some canonical clarity on that JSON meaning would serve the OP well before implementation.

Of course the JSON 'age' itself is problematic as it does not, pardon the pun, age well. A year from now that JSON will make the same employee stand still in time (which, for those seeking immortality, would be a feature, not a bug). The OP would be well served by obtaining JSON that contains a birthdate or birthyear (if necessary) so that proper calculations can be made in the future. This is what I refer to in advising considerations in both storing AND future retrieval in earlier posts.

It's plausible they want to update a specific employee, and the 'old' data may be an artifact from some user entry form (pure speculation obviously). Given that, I advise them to better examine their JSON source to understand what they are processing. This is the essence of that part of my post above. The OP may, or may not control that JSON, but they can at a minimum understand it's limitations and act accordingly.

I sure do hope they test appropriately so that they can learn before doing too much damage (Although as Brooks notes in MMM, "It is a very humbling experience to make a multi-million-dollar mistake, but it is also very memorable.").

Your eloquent python to sql, and conclusion seems sound enough, yet I do think the 'special' is the work (as noted in my post above). For folks to put in effort is 'special', and hopefully they (OP) will do so.

As to the non technical aspects, no offense taken. My apologies if I came across as a bleeding heart, I am actually a sociopath who thinks that the world being a better place benefits me personally. :) All humor aside, the SQLite license text ( https://sqlite.org/src/file?name=LICENSE.md&ci=trunk ) wisely summarises:

The author disclaims copyright to this source code. In place of a legal notice, here is a blessing:

May you do good and not evil. May you find forgiveness for yourself and forgive others. May you share freely, never taking more than you give.

regards

(8.1) By Keith Medcalf (kmedcalf) on 2022-12-29 17:25:26 edited from 8.0 in reply to 7 [link] [source]

While it's reasonable to presume that "update" translates to the sql UPDATE, it gets a bit dicey if the JSON "Type" has values such as "PUT" as well as "UPDATE"

You are making an assumption of facts not in evidence. There was no mention of any other value of "Type" and what to do if it were not "update". A prudent programmer would "validate their inputs" and ensure that the invariants were met, and if not signal a massive explosion the scale of which could not be missed even by a moribund observer.

And technically the sql does "update" all employees that have an age =22 to the 'data', it's seems unlikely that the JSON is intended to do that.

That is merely your opinion. There is no data nor clue one way or the other what is meant by that particular JSON. It should be noted, however, that any other interpretation requires "twists of logic" and "addition of complication" which requires the assumption that the JSON was conceived by an illogical person (one not, as Bones would say, firing on all thrusters).

It's plausible they want to update a specific employee, and the 'old' data may be an artifact from some user entry form (pure speculation obviously). Given that, I advise them to better examine their JSON source to understand what they are processing. This is the essence of that part of my post above. The OP may, or may not control that JSON, but they can at a minimum understand it's limitations and act accordingly.

This is not the question which was asked. The question which was asked was how to transate such a JSON snippet into an SQL statement taking advantage of the python to SQLite3 adaption layer and the underlying SQLite3 database "ducky typing".

Possible use of "de-ducking" the data via isinstance was suggested as a possible way to do this.

The answer presented an alternative which used the inherent "duckiness" of all the bits to perform the manipulation with "nothing special" needing to be done at all to diddle the poor duck. This is precisely what was requested.

There was no mention (nor request) to operate on different "type" of duck, or even that different "type" of ducks may exist.

(10) By anonymous on 2022-12-30 04:24:48 in reply to 8.1 [source]

(to admins: My apologies for having to discuss this. Ignoring the increase in personal aspersions is not working.  Nor was politely appealing to our better nature.  To spare the noise on the forum, perhaps it's possible to forward directly to recipient rather than post?
)


Given the thread title, and the last line of the OP, we will have to remain in disagreement on what the question was.  Obviously, our reading of it is not the same.

There are numerous errors in your latest post.  While I have no particular interest in the errors one way of the other, since the thread posts had abrasive callous aspersions direct towards an absent party, I will briefly note the errors before getting to the more serious issue.

1) The 'facts in evidence' is the OP, directing your attention to the thread title, and the closing question of the OP.  Between them the OP lays out two alternatives to compare.  It gives an input, which was a given (the JSON available), and the output (2 SQL update statements).  The question was (though now long lost) whether they should make use of dynamic typing of SQLite OR handle the typing in their app (python).

Further, 'facts in evidence' are an input, and an two options in handling output.  Noteworthy in the 'facts in evidence' is that not all of the JSON data is consumed by the SQL, and consistently so in both options.  These facts define the processing of the data.

2) Considerations of taking one approach or the other would be relevant to any analysis.  Characteristics of each approach (i.e. the repercussions, future porting of code, fragility/robustness, etc ) would be well within the scope of the question.

3) Given a) it is considered good form to use parameters in queries, and b) given python allows parameters (see my earlier post), and c) not relying on sqlite's 'flexible typing' would remove dependency on sqlite's unique feature from the solution (thus more portable solution) using option 2 (handle typing in python, via bindings) was recommended.

4) It is valid to point out that data is going unconsumed, however considering it an error is in itself an error.  While the OP could have reduced their example input JSON by removing the unused data and tightened their translation example, they clearly show both the input, and output.  They assert that the JSON input will translate to option 1, or option 2. That assertion IS part of the 'facts in evidence', and we will have to also remain at odds on that issue.

5) While your suggested third option is creative, and again showing your incredible talent, it answers the question of how one could take the presented JSON, and no other input, and create an sql statement via python.  It answers the question "how to translate [a] JSON snippet into SQL", and does it well.  What it does not do is answer the question which was asked, which is perfectly OK if we acknowledge that.  And it certainly should not open the flood gates to aspersions about the JSON source writer's competencies.  Once could argue that if that JSON was designed for this transaction, then it could be substantially better, but we can not fairly criticise them when a) they are not present, and b) we are ignorant of what the JSON even came from, let alone it's other purposes.

Our old friend Billy from the village of Occam would advise that all things being equal is violated when we start from the basis that the JSON's sole purpose was as input (with that basis by reduction, it would follow the JSON is not necessary at all).  And Billy would point out that we must account for ALL the presented evidence (namely, we can't ignore that both input and output are given, nor that there are precisely two sentences in question form in OP).  This, as you know is what is meant with ALL things being equal.  Mr Occam has no issue with allowing common known facts (eg. earth orbiting the sun ) as these would be known externals.  In our case, we presume rules of set theory apply, yet are not stated.

Where we differ is that you take the JSON as the canonical truth, and work anything that does not fit with it as an error (including the JSON), while I take the input, output 1, output 2 as canonical truth, makes no assumption about the intelligence of the author, and allows for unconsumed JSON data.  So I think you have the correct principle, but through what I suspect is bias, misapply the advice of our mutual friend (this is my speculation as to 'how' you got there).  I suspect the mistake would not have been made if the personal aspersions were left out.  The addition of the personal aspersions in this case creates a circular logic path between, 'must be wrong, because author was illogical' and 'since author was illogical, we correct the illogical parts'.  If you work through it, you may detect where that happens (try starting from the author had their reasons for doing it that way, which means the JSON translation IS valid, and you reach a different result).  The personal aspersion addition, however, violates Mr Occam's advice on 'all things being equal'.  And this is educational, because when we allow that addition we create false counter evidence which leads astray.  It's not the first time folks got cut shaving with Mr Occam's razor.


The larger issue is the negative aspersions on people who are not involved in the question, had no role in presenting the JSON, no opportunity to define, explain or otherwise defend themselves.  It's not clear what the point of disparaging them is, as there is no discernible benefit to be had, by you, OP, me, sqlite folks, of anyone.  Unfortunately, even if viewed purely absent of moral and ethical considerations it diminishes otherwise outstanding contributions, tarnishing a stellar reputation of the writer, rather than the recipient.  Make no mistake here, the issue has been around a very long time, and it's one I have spent considerable effort on (unfortunately) over the years from multiple perspectives.  What the behaviour does is corrosive to any group, organisation, business, and even personal relationships.

But this isn't 'just' my opinion, and it's not simply recent 'wokeness'.  From Harvard Business Review (1978), the article below discusses how it can be toxic to an organisation, and some ways to address it (of note is how some of the hierarchical structures are passe).   My professional experience of roughly 1/2 century is that folks will encounter this sooner or later, and can, believe it or not, be on both sides at the same time (my contention is that when someone is both contributor and recipient it's actually easier to address, although this aspect is not developed in the article).  Setting aside, bleeding hearts, or sociopaths, if folks are interested in getting the best results then individual awareness of the behaviour may go a long way.  I've personally had developers who juggled, practice pitching (yes fastball in the office), and a whole many of things that other departments would frown upon.  And I would council that while I myself didn't care one way or the other, if they wanted to advance in the company, or other companies, they should at least be self-aware that the reputation they are creating may hold them back.  One of those characteristics is how they treat others.  The articles below illustrate this in more detail.

https://hbr.org/1978/05/the-abrasive-personality

and

https://hbr.org/2021/04/managing-a-top-performer-who-alienates-their-colleagues

That said, one hypothesis I've entertained over the years to explain why bright (in fact often the brightest) engage in this is that it's seen as a path to a solution.  Indeed, I believe the behaviour is more common under stress (difficult situations, problems, tired, etc).  As I detailed above, the injection of the personal can create a pathway to a false solution.  Whether blaming an ex-employee for all the company's woes (twitter anyone?), or the criticism of writer of the creator of the OP JSON, it simplifies the apparent solution.  But it violates the first part of Mr Occam's advice on that matter.

Curbing the personal aspersions helps avoid the logical mistakes I cite above, and can keep Billy from the village of Occam our friend.

So, my answer to the actual thread question remains: take option 2, and use parameters.

(11) By Larry Brasfield (larrybr) on 2022-12-30 08:12:22 in reply to 10 [link] [source]

(to admins: My apologies for having to discuss this. Ignoring the increase in personal aspersions is not working. Nor was politely appealing to our better nature. To spare the noise on the forum, perhaps it's possible to forward directly to recipient rather than post?)

To avoid hijacking this thread, which is more or less topical(^), I am starting a new thread to deal with the subject you raise here and touch upon further later in your post.

For reasons that will become clear in that thread, I nearly rejected your post #10. Unfortunately, it appears the interpersonal issues need some attention.

(9) By Keith Medcalf (kmedcalf) on 2022-12-29 17:41:29 in reply to 6 [link] [source]

You could use these to generate more "pleasing" SQL:

sql = 'update {}'.format(a['table']) + '\n' + \
      '   set ' + ',\n       '.join('{} = ?'.format(k) for k in a['data'].keys()) + '\n' + \
      ' where ' + '   and '.join('{} = ?'.format(k) for k in a['old'].keys())
param = tuple(list(a['data'].values()) + list(a['old'].values()))

which would give this pretty (but otherwise functionally equivalent) sql and param:

update employee
   set id = ?,
       name = ?,
       age = ?
 where age = ?
>>> print(param)
(111, 'swastik', 23, 22)