How to expose SQL API query endpoint safely?

(1) By Eric (herdingcat) on 2022-08-29 04:36:30 [link] [source]

Hi,

I'm not sure if my question is related to SQLite specifically but I would like to hear some ideas about this.

I currently implemented a series of API endpoints to access some data from an SQLite table. However, I need to add another endpoint to allow users to query via sending SQL queries directly. This means users can construct any SQL queries to the endpoint then the API will return the results.

My question is how can I make sure the users cannot pass some dangerous queries(for example, DROP)? The endpoint should only be used for read-only purposes. My initial idea is to only allow the query that starts with SELECT and not allow multiple queries in one SQL query line.

I never implemented anything like this before so is there any framework or library I can use to achieve this goal? I don't want to overdesign my API implementations to make it too complicated.

The current implementation language is Python and the WSGI framework is Flask.

Thanks.

(2) By Ryan Smith (cuz) on 2022-08-29 09:07:09 in reply to 1 [link] [source]

What you need is the Authorizer API:
sqlite.org/c3ref/set_authorizer.html

(13) By Eric (herdingcat) on 2022-09-02 04:49:49 in reply to 2 [source]

Thanks! But I have some questions about the authorizer callback function. I wrote a small demo code here: https://pastebin.com/mhe0pxP3

My question is:

1. What's SQLITE_READ? Why I have to allow SQLITE_READ if I want to do SELECT to get all data?

2. The output is
21 None None None None
20 snoopy name main None
20 snoopy color main None
20 snoopy address main None
[('mydog', 'brown', 'CA')]

So what's "main" in this case?

Thanks.

(16) By Ryan Smith (cuz) on 2022-09-03 22:39:00 in reply to 13 [link] [source]

The Authorizer has many more uses than just preventing database changes. It can also be use to limit viewing of sensitive database content, or perhaps specific tables or sensitive columns within tables - it can get very specific - as can be seen in the data for your second question.

As to answering the second question - that's the database. In your case it is "main". It's always main for the file you open at the start of the connection, but can be different for the TEMP database or any attached database schema.

See sqlite.org/lang_attach.html which also nicely explains, among other things, the use of "main" and "temp".

(14) By Eric (herdingcat) on 2022-09-02 04:56:49 in reply to 2 [link] [source]

Another follow-up question:

It looks like my code does not stop dropping table operation. If I change the execute query to "DROP TABLE snoopy", the output is:

9 sqlite_master None main None
11 snoopy None main None
9 snoopy None main None
9 sqlite_master None main None
20 sqlite_master tbl_name main None
20 sqlite_master type main None
23 sqlite_master rootpage main None
20 sqlite_master rootpage main None

And the table is deleted. So why this is not working?

Thanks.

(15) By Eric (herdingcat) on 2022-09-02 05:11:03 in reply to 14 [link] [source]

sorry there was a typo in the paste so it was not working, after I fixed the typo it's working.

(3) By anonymous on 2022-08-29 09:11:21 in reply to 1 [link] [source]

A couple thoughts:

Check if your python sqlite driver supports setting an authorizer and use that to allow only select queries.

You might also want to check the interrupt API so you can stop long running queries from bringing your API to a halt.

It is always dangerous to accept arbitrary input directly from the user, keep that in mind as you think of different ways they can break your backend

(4) By anonymous on 2022-08-29 09:22:13 in reply to 1 [link] [source]

Whatever other clever solutions you and others come up with, you can open the database connection as read-only (eg with the file: API) if the endpoint shouldn't ever be able to alter the database.

What counts as "dangerous" radically depends on who you are allowing to access the API and what the purpose of the database is, but allowing raw SQL to be entered by a user is a terrible idea if you require any sort of privacy or controlled alteration of data.

Read-only should be fairly bombproof though, if that is sufficient and required.

(5) By Simon Slavin (slavin) on 2022-08-29 13:20:16 in reply to 1 [link] [source]

Other people have covered features built into SQLite well. My knowledge is more general-purpose.

First, you can include in your server-side API code a check to see that the command starts with SELECT . This is highly effective !

Second, if your are implementing a remote access API which you want to protect, there are some techniques you may want to use.

Pass along a checksum as a parameter of the query. Your endpoint takes the SQL command it needs and uses an algorithm to calculate a checksum, using a well known checksum technique. The server checks to see that the SQL command fits the checksum.
Encrypt the SQL command. Or encrypt the checksum. Or both.
Incorporate the SQL command and checksum into the same encrypted parameter. This is more difficult for an attacker to understand.

How effective these are depends on how easy/likely it is for an enemy to launch an interception and man-in-the-middle attacks.

Other protection techniques include checking the originating IP addresses of any requests. You may have an idea about where your 'trusted' endpoints are.

(7) By Donal Fellows (dkfellows) on 2022-08-29 15:24:35 in reply to 5 [link] [source]

Second, if your are implementing a remote access API which you want to protect, there are some techniques you may want to use.

Pass along a checksum as a parameter of the query. Your endpoint takes the SQL command it needs and uses an algorithm to calculate a checksum, using a well known checksum technique. The server checks to see that the SQL command fits the checksum.

Encrypt the SQL command. Or encrypt the checksum. Or both.

Incorporate the SQL command and checksum into the same encrypted parameter. This is more difficult for an attacker to understand.

Or run inside HTTPS with standard session-binding and a CSRF token. This is a problem that's had extensive effort put into solving it; reinventing it all yourself is a lot of work. (Proper encrypted channels are better than checksumming of requests.)

(12) By KIT.james (kjames3411) on 2022-08-30 13:27:53 in reply to 5 [link] [source]

First, you can include in your server-side API code a check to see that the command starts with SELECT . This is highly effective !

If and only if the script/program checks that it is a single command (not multiple commands concatenated)

(6) By beetlejuice (coleifer) on 2022-08-29 14:02:05 in reply to 1 [link] [source]

You may also want to have a look at a progress callback, e.g.

https://www.sqlite.org/c3ref/progress_handler.html

I imagine it would be possible for a malicious user to send you a select that enters an infinite loop if you are attempting to consume the entire result set in the response. So you would probably want to implement a progress handler to abort very long queries.

But if it were me, I would never expose sqlite to untrusted input.

(8) By Keith Medcalf (kmedcalf) on 2022-08-29 15:41:23 in reply to 1 [link] [source]

Simple open the connection to the file with the FILE being readonly. That is, use a URL filename of the form file:<filename>?mode=ro.

Then simply send the statement to be executed and return the result.

Nothing complicated at all.

Should take less than 15 seconds to write.

(10) By anonymous on 2022-08-29 18:14:22 in reply to 8 [link] [source]

That's a good idea. Someone could have posted that earlier.

(9) By Chris Locke (chrisjlocke1) on 2022-08-29 16:09:38 in reply to 1 [link] [source]

As soon as something is public-facing, you have to imagine every way it can be broken.

Are you going to limit the API requests? As soon as someone finds out your API works, they'll script it. So at work we use someones API to retrieve details of hotel x. x is numeric, so its easy to create a loop which retrieves all hotels from 0 to 9999999. Great for us, not for them.

Never assume you can limit on their IP address. This can be spoofed, VPN'ed, etc. Much better to let them have to come in on a predefined key (usually a UUID). This also gives you easy way to knobble individual users from using your API.

Log everything. Twice. OK, overkill, but log so that you can spot the users passing wacky APIs, or 8000 requests a minute. And don't just log and archive the logs. Go through them.

If you're giving people 'raw' access to the database, are there any tables in that you don't want them to see? Any secure information? Don't assume they can't see those table names.

(11) By Keith Medcalf (kmedcalf) on 2022-08-29 19:47:44 in reply to 9 [link] [source]

Great for us, not for them.

Now why on earth would you think that? The purpose for which "them" exists at all is to provide information to "you" via their API (thus, without "you" making the requests that you do, "they" would not exist any longer -- "they" would be bankrupt).

If your behaviour were annoying to "them", then "them" would put "a stop" to it.

The implementation of "a stop" may range from mere rate limiting to dispatching an assassin, as may be appropriate in the circumstances.