Open vs Attach Database

(1) By spaceman on 2022-10-24 07:19:22 [link] [source]

Hi all

I always use the Open command, on a Database, in order to work on it.

I see the Attach command, and I am not sure what is the difference between them,
and when should I use Attach vs Open..

If anyone can clarify this, it would be great.

Thank you very much

(2) By Tim Streater (Clothears) on 2022-10-24 09:06:58 in reply to 1 [link] [source]

I regularly move rows between databases (they have to have the same schema, obviously). To effect this, I OPEN one database and ATTACH the second. That makes it easy.

(3) By Ryan Smith (cuz) on 2022-10-24 10:01:46 in reply to 1 [link] [source]

when should I use Attach vs Open

When querying only a single schema at any one time, there is no difference in function between an attached file and an opened file, it's the same thing with different names. However, Attach enables multiple-file queries.

Mainly the differences are:

Open opens a file for working on it into the static main schema reference (or perhaps it is better to use catalog here?).
Attach opens a file for working on it into a named (aliased) schema reference which you may name how you like.
You can only open a file once, and you can only have one open file at any one time
You may attach a file many times and have many attached files at any one time.
The only possible way to do queries (such as moving data) in-between schemata or join result sets across different files, is to have them attached as separate named schema references.

I could be missing something, but to my best knowledge those are the only differences.

(4) By Chris Locke (chrisjlocke1) on 2022-10-24 10:57:30 in reply to 1 [link] [source]

Just to add to the replies.

'Open' is generally used to open one particular database file. 'Attach' is used to add another database file to the existing database. This is handy if you want to query data across two different databases. Or, if a database gets large, you can split it into smaller chunks, with each chunk containing one or more tables.

It is far easier to work on a database file rather than have to rely on attaching other databases though, as it's not always clear what 'other databases' you may need.

Ryan mentioned schemas. Just to expand on this.

When you normally open a database file, it gets opened in a schema called 'main'. Lets assume it has a table called 'table1' in it. If you attached another database called 'fred' which also had table1 in it, you'd have to tell sqlite which table1 you wanted - the original one, or the one in fred. This is where the schema name comes in. The original one is identified as main.table1, and the one in fred is fred.table1. When you attach another database, you can assign what you want this schema name to be.

(5.1) By Keith Medcalf (kmedcalf) on 2022-10-24 15:26:14 edited from 5.0 in reply to 1 [link] [source]

sqlite3_open* API's open a database connection. This connection is required to have (at least) a single database file attached to it, called "main" (or other name specified by your local copy/application sqlite3 library). This database may be a "file" or it may be ":memory:", or it may be something else.

Nonetheless, *sqlite3_open* creates an sqlite3 connection and, as a side effect, opens a database file (of some description) for the "other end" of the connection -- a connection from here to there would be completely useless if it had only a here and no there -- would not it be?

The SQL ATTACH command, however, does exactly what it says on the tin. It ATTACHes a new database file to an existing connection, thus now allowing the connection to get from here (you and your application) to there (the database which was opened when the connection was opened) and to the new database which you attached with a specified name.

What you think of as "the database" being "opened" is really a "connection" being created and the "main" database being attached to that connection.

(6) By anonymous on 2022-10-24 15:50:51 in reply to 5.1 [link] [source]

It ATTACHes a new database file to an existing connection,

So, in the CLI you can ATTACH a database without explicitly having OPENed one but programmatically you cannot, i.e. programmatically you can only ATTACH to an open connection.

(7) By Keith Medcalf (kmedcalf) on 2022-10-24 16:59:35 in reply to 6 [link] [source]

So, in the CLI you can ATTACH a database without explicitly having OPENed one

Simply transitioning the CLI executable from the "dormant" to the "running" state opens a connection to the database you specified on the command line. If you did not specify a database file to be used for this connection, then ":memory:" is assumed. Mutatis mutandis the CLI .open command, which performs and sqlite3_close on the current connection and opens another.

The CLI has no capability to not have a connection open except if you exit the application.

(8.1) By spaceman on 2022-10-27 14:23:52 edited from 8.0 in reply to 5.1 [link] [source]

Thank you all for the interesting information and differences between the two.

I would like to verify something:
Most of you mentioned that ATTACH attaches a new database file to an existing connection.

Can SQLite open 2 different databases, each with the OPEN command?
(thus having 2 different connections - one for each DB)

(9.1) By punkish on 2022-10-27 14:28:35 edited from 9.0 in reply to 8.0 [link] [source]

Can SQLite open 2 different databases, each with the OPEN command? (thus having 2 different connections - one for each DB)

what do you mean when you say SQLite? are you referring to the sqlite command line program? If yes, then no. You can open one, and attach others. On the other hand, if you are using some programming language or development env, yes, you can open as many connections as you want. For example, I use nodejs, and I can have any number of open connections, and attach other dbs to each or all of them.

update: See Keith's post that explains the limits of the CLI very clearly.

(10) By Keith Medcalf (kmedcalf) on 2022-10-27 14:35:21 in reply to 8.1 [link] [source]

You application may open as many connections as your little heart desires. Each connection will be from here (your application) to there (the database you specified to open on the other end of the connection).

On each connection you may also attach a bunch of other databases. This means that your connection can not only go from here (your application) to there (the database that was opened along with the connection), but also to other there's which you have attached to the same connection.

A connection may only get from here (your application) to the databases attached to that connection. A connection may not access databases that are not attached to the connection on which your application is issuing commands.

(11.1) By Keith Medcalf (kmedcalf) on 2022-10-27 18:29:29 edited from 11.0 in reply to 8.1 [link] [source]

Think of it as a hiway system.

Your application can open a connection to Toronto. You now have a connection (hiway) between you and Toronto. You may issue requests (SQL) which obtain information / data from Toronto. You may not issue requests over this connection to Toronto addressing Montreal because Montreal was not "attached" to your hiway connection system. You can attach Montreal. Now you have a hiway connection between you and Toronto and Montreal. You can now issue requests via this hiway connection that accesses either Toronto or Montreal. But not Vancouver because Vancouver is not attached to the same hiway system (connection).

You can create a completely separate connection to Vancouver. You can now issue requests via the connection to Vancouver regarding Vancouver. However, the connection which is connected to Toronto and Montreal cannot access Vancouver. Not can the connection which is connected to Vancouver access Toronto or Montreal.

So now you have two connections, one to two places, and one to a single, third place.

You can open as many connections as your little heart desires (and you can afford). Each connection must connect to at least one but up to as many destinations as your little heart desires (and can afford).

SQLite is like a travel agent. You use it to book "connections" between your application and the destinations your application wants to talk/visit to. Each "trip" is isolated from every other "trip", and each trip may encompass as many "stops" as you wish (and can afford). However your "trip" to Naples and London cannot have breakfast in Marakesh unless you book a stop (attach) Marakesh to your trip (connection). And booking a separate trip (connection) to Marakesh does not make it part of the trip to Naples and London.

(12) By Chris Locke (chrisjlocke1) on 2022-10-27 15:18:38 in reply to 8.1 [link] [source]

Depends on what you want to achieve.

When you open a database and attach another, you can perform queries across both databases, as if they were one, by joining tables, etc.

If you open two databases via two different connections, they can't share data. The connections are independent.

(13.1) By Keith Medcalf (kmedcalf) on 2022-10-27 18:16:53 edited from 13.0 in reply to 9.1 [link] [source]

what do you mean when you say SQLite? are you referring to the sqlite command line program? If yes, then no

This is incorrect and has been for several versions of the CLI. In the olden days the CLI could only .open one connection at a time -- when you opened a new one, it closed (replaced) the previously open one, and by default the connection opened was to the end-point specified on the command-line. The only way to have the one and only single connection open was to have transitioned the CLI from dormant (disk) to active (running) -- and the only way to close the connection was to transition the CLI from active (running) to dormant (disk) -- most likely by exiting the CLI.

The current versions (and for some time going back) the CLI has been able to open multiple connections. At least one connection is required to be open at all times (the current connection), and that connection behaviour and requirement is unchanged. You may, however, switch to a specific connection number and .open a database connection in that slot.

I do not know what the connection limit of the shell is (definitely more than 1, probably as many as you can afford).

See the .connection command.

(14.1) By Keith Medcalf (kmedcalf) on 2022-10-27 18:29:42 edited from 14.0 in reply to 13.1 [link] [source]

The limit is 10 connections.

That is, .connection 0 (the default), through .connection 9.

(15.4) By spaceman on 2022-10-27 19:27:38 edited from 15.3 in reply to 12 [source]

Thank you very much Keith and Chris.

From these 2 points that you two wrote:

Keith Medcalf:
> You can create a completely separate connection to Vancouver.
> You can now issue requests via the connection to Vancouver regarding Vancouver.
> However, the connection which is connected to Toronto and Montreal cannot access Vancouver.

Chris Locke:
> When you open a database and attach another, you can perform queries across both databases, as if they were one, by joining tables, etc.
> If you open two databases via two different connections, they can't share data.
> The connections are independent.

I understand that the term "Connection" hides a bit more under it, than just a connection.

A connection is actually like a "realm" or "environment" inside the SQLite process,
and all databases that are in a realm, can "communicate" with each other, but not with databases in different realms.

It seems (to me) that the term "Connection" is not enough in explaining the real meaning of what it does..

But of course, I understand that the term "Connection" was not invented by SQLite,
and is used in all other databases, with the same meaning.

(16) By Ryan Smith (cuz) on 2022-10-27 20:17:44 in reply to 15.2 [link] [source]

It seems (to me) that the term "Connection" is not enough...

That may be so, but words, or terms are fickle things, and my guess would be that your expectation in what meaning that term conveys has more to do with your experience with other DB engines than any deep belief in the absolute meaning of the word itself.

That is not to say that you are wrong, just that your understanding has been molded by your experience. This is perfectly fine btw, but mentioned to illustrate that it is perhaps time to take on more or different meanings (maybe "wider" meanings?) in your encounter with SQLite, rather than expect SQLite to alter its terminology to better suit your existing experience.

An analogy might be the word "Bathroom", it can be used to describe a small dilapidated room with a shower only, to a public room with vast amounts of basins and loos, to very fancy hotel en suite's with large bath, bidet, all kinds of comfort gadgetry and more. Some of those clearly do a bit more than the others, but in principle, they're all very adequately named "Bathroom" and nobody would suggest you call one of them different for lack of, or excess of functionality.

In SQLite, a connection is a memory structure (or object if you prefer) with data, internal state and methods all working together to read/write and maintain data in an SQLite DB file and allow the owner process to exchange data with the file via the said methods.

This is pretty close to what a "connection" to most other DB engines entail, with the small difference that while the brunt of the working bits in other connections are for "talking" to the host DB engine, in SQLite's case, the connection is in fact both the "talking" interface (API) and the "DB engine" in and of itself, all-in-one, lock, stock and barrel, the full Monty, etc.

I understand that the term "Connection" hides a bit more under it, than just a connection.

As you have discovered in the previous posts, an SQLite connection itself does more than what would typically be expected of a DB connection, but that's just a notion of degree, it is still very much just a "connection".

(17.1) By spaceman on 2022-10-29 06:20:54 edited from 17.0 in reply to 16 [link] [source]

Hi Ryan

I understand, thank you for this.

BTW,
what happens in terms of # of Processes vs Open and Attach?

For example in these 2 scenarios:

Scenario 1:
Open DB1
Attach DB2

Scenario 2:
Open DB1
Open DB2

In Scenario 1 we have 1 Connection.
In Scenario 2 we have 2 Connections..

(18) By MBL (UserMBL) on 2022-10-29 06:31:22 in reply to 17.1 [link] [source]

Scenario 2 can be seen and studied also in my recent post about VACUUM INTO. There I use for a single process single thread a 2nd connection even when then same process/thread before used one or more ATTACH commands (Scenario 1). SQLite3 is very flexible and allows a lot of flexibility.

Of cause Scenario 2 could be done also be two threads or two processes (e.g. starting the CLI in parallel), but Scenario 1 can be done only by a single process.

(19) By spaceman on 2022-10-29 12:43:07 in reply to 18 [link] [source]

Thank you MBL

(20) By Ryan Smith (cuz) on 2022-10-29 13:00:15 in reply to 17.1 [link] [source]

what happens in terms of # of Processes vs Open and Attach?

Not quite sure that is a valid way to think of it.

A connection requires a process (or thread even) to do its business, obviously, like any other bit of code, but the connection object exists independent of the process or thread.

In fact, it is quite common to create connections and shelve them in memory, then if needed, assign one to a thread to go off and do some business reading/writing a DB, and afterwards, shelve it again to wait to be used at a later stage by possibly a different thread.

A connection will "do" its business when the controlling process assigns it a thread (or employs the main thread) and executes/calls any of the APIs that the connection object exposes.

An attached DB is also a connection, but by contrast it belongs to the connection that created it and exposes its APIs only to the connection within which it was created. The programmer or owner process has no direct access to the API of the attached DB connections and can only control the main connection which it itself created.

This main connection can host many attached connections, but it needs still only 1 thread to execute its business and it would in turn decide how to manage the hosted connections inside itself, and either serialize actions or create more threads as it deems necessary to deal with the tasks which the programmer required from those hosted (attached) DB connections.

By the way, this idea of "creating more threads to do its business" is not only when it has attached DB connections, any connection can do that at any time when parallelism makes sense, like adding more threads to help sorting of a large result-set.

So to answer the question: The number of processes or threads needed to do the business that the programmer requires from SQLite connections, depend very much on the required actions and less so on the amount of connections or attached connections within them.

PS: Some of the above is somewhat oversimplified, but hope that made some sense!

(21) By Simon Slavin (slavin) on 2022-10-29 13:19:57 in reply to 17.1 [link] [source]

Connections are not related to processes. SQLite does not start its own processes. SQLite does not start or maintain a central 'server process' which does any coordination. SQLite does not do anything in the background, or between the calls you make to it.

Inside a process of your program, you call a routine in the SQLite API. Everything that call does is done inside the processes that called it. Then control is returned to your software.

You can open one connection to SQLite, and use it from many different processes your software has started. Or you could open a hundred connections to SQLite, and call them all from the same process of your software. Any process involves was started by your own software.

(22) By spaceman on 2022-10-29 15:08:30 in reply to 20 [link] [source]

Yes, completely understood now. Thank you

(23) By spaceman on 2022-10-29 15:09:51 in reply to 21 [link] [source]

Thank you very much Simon

(24) By Keith Medcalf (kmedcalf) on 2022-10-29 17:17:36 in reply to 23 [link] [source]

A connection is like a hallway. You "build" hallways with the sqlite3_open* APIs. When you build a hallway, it must go somewhere. This is why sqlite3_open* APIs, as well as building a hallway (connection) also put the database you specified to open at "the other end" of the hallway. Think of that database as a filing cabinet.

Now you can traipse down the hallway to the filing cabinet and retrieve/store files.

You can use the sqlite3_open* API to build another hallway that goes to a different filing cabinet. Unfortunately, you cannot access both filing cabinets in "a walk down the same hallway" because they are not in the same hallway.

Alternatively, you may "attach" an additional database (filing cabinet) to an existing hallway. You can now walk down this hallway and access two filing cabinets. You can even use "readings" from the files in one to find files in the other.

(25) By Gunter Hick (gunter_hick) on 2022-10-30 09:06:10 in reply to 21 [link] [source]

Cave: Carrying open connections across fork() calls is a documented way of corrupting your database.

See https://www.sqlite.org/howtocorrupt.html section 2.6