Rethinking Requirements...

(1.1) By Bill Goehrig (bill.goehrig) on 2021-11-16 18:41:46 edited from 1.0 [link] [source]

Hi Dan,

We're finally widely rolling out our initial (mostly read-only) blockcache-based workflow to production this week, and as we look forward to future projects that could leverage this, we thought it best to re-evaluate some of our fundamental assumptions here.

On reflection, we think our original requirements were a bit excessive, which caused the current design to be a bit too complicated for our actual upcoming writing use cases.

We'd like to propose some significant changes to blockcache VFS that will hopefully simplify everything and make this a lot easier for us to deploy and use for several other applications.

Move the daemon in-process (usually): The daemon as a separate process is actually working quite well for us in our initial read-only case, but for every other use case we have, there's no real need to manage storage at a system level, and in fact we expect isolation will be a requirement. Also, I suspect that using sockets for IPC will lead to some headaches (firewalls, ports in use, etc.) when we want to ship this in a desktop product.

So if possible, we'd prefer to require that all read-write connections must run in-process, and when opening a database read-only the use of a separate daemon process would be optional. Of course, anytime we run in-process we can require exclusive access to our own cachefile, blocksdb, etc. And it's fine with us if some of the "daemon" functionality still lives in its own thread when running in-process.
Make all uploads explicit and synchronous: In hindsight, we think a "dumber" approach to uploading would actually be much more straightforward. Basically, we think it would be sufficient to have a single (blocking) API call that will checkpoint (if necessary), upload all blocks, and delete any garbage from the container. Then we would not have any need for automatically uploading on checkpoints, automatically uploading on close, or even any garbage collection interval.

Even though the actual requests can still happen in parallel on many threads, we think having an imperative, synchronous API like this will help avoid a lot of tricky race conditions and hopefully bring blockcache more in line with our existing (synchronous) SQLite mental model. And, to be clear, the application would still be on the hook to manage access to the containers and guarantee that there are never any concurrent writers.
Make checking for manifest changes explicit and synchronous: Along the same lines as uploads, I think we can just leave this up to the application and remove the polling interval.

Obviously, these are significant breaking changes, and we're fully prepared to manage that transition and roll out breaking code changes to our deployed services. But since we are using the current implementation successfully in those services, we'd strongly prefer to avoid making any breaking changes to either the manifest or block formats if possible.

Overall, what are your thoughts on these proposed changes? I'm very sorry that this is a lot of work that probably could have been avoided had we gotten the requirements right in the first place, but I really think a change in direction like this will make BCVFS more generally useful, and let us (and others) get a whole lot more use out of it.

Thanks,
-Bill

(2) By Dan Kennedy (dan) on 2021-11-16 20:55:59 in reply to 1.1 [link] [source]

Hi Bill,

We're still pondering this here. Should have some questions tomorrow I think.

Dan.

(3) By Dan Kennedy (dan) on 2021-11-17 15:48:14 in reply to 1.1 [link] [source]

Hi,

To get started, does the following seem correct to you?

We're now talking about 2 deployment modes:

Single-process mode.
- All clients must be part of the same multi-threaded process.
- Clients may be read/write.
- There is no daemon process.
- There is no need to manage SAS tokens or attempt to keep data private - any client within the process may access any databases for which an SAS has been provided.
- The application will take care of ensuring that the cachefile/blocksdb are sufficiently private.
Multi-process mode
- Read-only clients from multiple processes. The advantage of this mode over each client using single-process mode is that clients from multiple processes can share downloaded blocks and a single maximum-disk-space limit.
- Question: These are the only advantages, correct? And we do still need this, correct?
- Will probably still use a daemon in a separate process. Assuming so, using sockets for IPC as in the current blockcachevfs will work.
- The daemon only polls for manifest changes when requested to do so by a client. Never writes to cloud storage for any reason.
- Question: Given that such a daemon can't write cloud storage, do we still need "secure" containers (where clients must provide a valid SAS before they are allowed to connect to the daemon)? Or can we drop that feature?

Making uploads, polls etc. happen only on demand is a good simplification I think. If only because at least this way there is always a client to return an error to if something goes wrong.

Dan.

(4) By Bill Goehrig (bill.goehrig) on 2021-11-17 18:08:04 in reply to 3 [link] [source]

Yeah, that all looks right to me!

As for your questions:

Question: These are the only advantages, correct? And we do still need this, correct?

Yes and yes. We think being able to share blocks and have a single disk-space-limit will still be very helpful with maximizing VM utilization on our end.

Question: Given that such a daemon can't write cloud storage, do we still need "secure" containers (where clients must provide a valid SAS before they are allowed to connect to the daemon)? Or can we drop that feature?

Yeah, I think we still want that feature so we can mitigate the risk of a rogue application stealing (plaintext) blocks it doesn't even have read access to. One change I would request there though would be to just force the application to implement the actual encryption whenever it wants to enable secure containers (i.e. let us pass a callback to use instead of the existing bcvEncrypt function). Hopefully that simplifies things a bit on your end, since you don't need the whole AES128 implementation, and it'll let us easily configure other encryption modes, hardware acceleration, etc. But you're right that none of that will be needed in mode 1 (single-process mode).

Thanks,
-Bill

(5) By Dan Kennedy (dan) on 2021-11-17 19:22:45 in reply to 4 [link] [source]

(i.e. let us pass a callback to use instead of the existing bcvEncrypt function). Hopefully that simplifies things a bit on your end, since you don't need the whole AES128 implementation, and it'll let us easily configure other encryption modes, hardware acceleration, etc.

That would imply that only the clients have access to the encryption routines. Which means any daemon process could not upload or download blocks. That's not necessarily a bad thing I suppose.

(6) By Bill Goehrig (bill.goehrig) on 2021-11-22 19:44:20 in reply to 5 [link] [source]

Actually, after thinking about this a bit more, I think it'd also be fine if this were something defined at build time instead (similar to how we can extend by writing our own cloud "modules"). That way we wouldn't be limited so just the client process...

(7) By Bill Goehrig (bill.goehrig) on 2021-12-08 14:52:51 in reply to 3 [link] [source]

Hi Dan,

Does this still sound worth pursuing to you? If so, do you have any rough estimate of the effort required/how long it might take?

Please let us know if you have any other questions for us. We're contemplating some other feature requests and would like to hear your thoughts on this before moving forward with any of those...

Thanks!
-Bill

(8) By Dan Kennedy (dan) on 2021-12-08 15:05:52 in reply to 7 [link] [source]

do you have any rough estimate of the effort required/how long it might take?

I started working on changes last week. Should have something soon.

Please do let us know what other features you need from this.

Dan.

(9) By keith on 2021-12-08 15:51:00 in reply to 8 [link] [source]

Hello Dan,

A related request would be to provide apis for the blockcachevfsd commands:

create
delete
copy
list

so they can be performed without needing a daemon.

Thanks!

Keith

(10) By Dan Kennedy (dan) on 2021-12-08 16:08:41 in reply to 9 [link] [source]

Hi,

We can do create, delete and copy (but not "list") already using this API:

https://sqlite.org/blockcachevfs/file?name=src/bcvutil.h&ci=tip

these APIs talk to cloud storage directly, not via any daemon. This API will still be there following the current round of changes. We should be able to add "list" and any other operations to it as required.

Dan.

(12) By keith on 2021-12-08 16:38:41 in reply to 10 [link] [source]

Oops, I don't know how I missed that create/delete/copy already exists (in fact, we're already using them).

"list" would be helpful though, although may be a more challenging api (maybe a callback?)

Keith

(11) By keith on 2021-12-08 16:26:11 in reply to 9 [link] [source]

Also (not that you need more things to work on, but this may be helpful for other users of "cloud SQLite" as I call it) please consider:

We currently have an external "gatekeeper" service to ensure there can only be one writer to a container. But that makes an external dependency on a service that complicates things and adds another point of failure.

I think a semaphore-based solution could be simpler, more efficient, and more reliable.

Suppose there was an additional blob in the container called locked-by (or something).

An "obtain_lock" function would take a string identifier and an expiration time.

If locked-by doesn't exist, attempt to created it with the id/expiration, return status of create.
if it does exist:
- if (now<expiration) fail with id of current holder
- else overwrite existing file with id/exp (using if-not-changed flag?)

An "extend_expiration" function updates locked-by with a new expiration.

a "release_Lock" function deletes locked-by

I realize that this only works if all clients play by the rules of obtain_lock, and that there is a clock synchronization dependency. I think we could make the minimum expiration period an hour (say) so that could be tolerated. If there's a race condition here I don't see it, assuming create fails if the blob already exists.

Obviously, since we can create blobs in the same container, we can write this ourselves. But seems to me to be generally worthwhile if others wish to use cloud-SQLite since single-writer enforcement is paramount to the design.

Keith

(13) By Dan Kennedy (dan) on 2021-12-08 18:00:21 in reply to 11 [source]

else overwrite existing file with id/exp (using if-not-changed flag?)

That does sound possible. We already have a requirement that the cloud storage support such a primitive.

Dan.

Cloud Backed SQLite

Rethinking Requirements...

(1.1) By Bill Goehrig (bill.goehrig) on 2021-11-16 18:41:46 edited from 1.0 [link] [source]

(2) By Dan Kennedy (dan) on 2021-11-16 20:55:59 in reply to 1.1 [link] [source]

(3) By Dan Kennedy (dan) on 2021-11-17 15:48:14 in reply to 1.1 [link] [source]

(4) By Bill Goehrig (bill.goehrig) on 2021-11-17 18:08:04 in reply to 3 [link] [source]

(5) By Dan Kennedy (dan) on 2021-11-17 19:22:45 in reply to 4 [link] [source]

(6) By Bill Goehrig (bill.goehrig) on 2021-11-22 19:44:20 in reply to 5 [link] [source]

(7) By Bill Goehrig (bill.goehrig) on 2021-12-08 14:52:51 in reply to 3 [link] [source]

(8) By Dan Kennedy (dan) on 2021-12-08 15:05:52 in reply to 7 [link] [source]

(9) By keith on 2021-12-08 15:51:00 in reply to 8 [link] [source]

(10) By Dan Kennedy (dan) on 2021-12-08 16:08:41 in reply to 9 [link] [source]

(12) By keith on 2021-12-08 16:38:41 in reply to 10 [link] [source]

(11) By keith on 2021-12-08 16:26:11 in reply to 9 [link] [source]

(13) By Dan Kennedy (dan) on 2021-12-08 18:00:21 in reply to 11 [source]