Best way to XOR two tables?
(1) By ddevienne on 2023-01-12 14:26:34 [link] [source]
I'm using the longuish query below. Is there a better way? There's lot of repetition in that query, which I find unappealing.
I guess two CTEs for the LHS and RHS tables might make it smaller syntactically, but would that yield the same plan?
I must use 3.36, so no access to newer right and full-outer joins, although a better query using the latest would also be interesting.
PS: Of course, replace {0}
below with the name of an attached DB.
SELECT type, name, tbl_name
FROM sqlite_master
WHERE name NOT LIKE 'sqlite%'
EXCEPT
SELECT type, name, tbl_name
FROM {0}.sqlite_master
WHERE name NOT LIKE 'sqlite%'
UNION ALL
SELECT type, name, tbl_name
FROM {0}.sqlite_master
WHERE name NOT LIKE 'sqlite%'
EXCEPT
SELECT type, name, tbl_name
FROM sqlite_master
WHERE name NOT LIKE 'sqlite%'
(2) By David Raymond (dvdraymond) on 2023-01-12 14:40:28 in reply to 1 [link] [source]
Maybe something like this? (not tested) select type, name, tbl_name from ( select type, name, tbl_name from sqlite_master where name not like 'sqlite%' union all select type, name, tbl_name from {0}.sqlite_master where name not like 'sqlite%' ) group by type, name, tbl_name having count(*) = 1 ;
(3) By JayKreibich (jkreibich) on 2023-01-12 17:01:57 in reply to 1 [link] [source]
If you can't do a FULL JOIN
, just union together two LEFT JOIN
s.
For example:
CREATE TABLE master1
(
type TEXT,
name TEXT,
tbl_name TEXT
);
CREATE TABLE master2
(
type TEXT,
name TEXT,
tbl_name TEXT
);
INSERT INTO master1 VALUES ( 'table', 't1', 't1' );
INSERT INTO master1 VALUES ( 'table', 'tx', 'tx' );
INSERT INTO master2 VALUES ( 'table', 't2', 't2' );
INSERT INTO master2 VALUES ( 'table', 'tx', 'tx' );
SELECT *, 'master1' AS instance
FROM
master1
LEFT JOIN
master2
USING (type, name, tbl_name)
WHERE master2.type IS NULL
UNION ALL
SELECT *, 'master2' AS instance
FROM
master2
LEFT JOIN
master1
USING (type, name, tbl_name)
WHERE master1.type IS NULL
;
The output of this would look something like this:
type name tbl_name instance
----- ---- -------- --------
table t1 t1 master1
table t2 t2 master2
Having access to FULL JOIN
doesn't make the query significantly simpler if you want the output to not be split across multiple column sets. To get the same output using FULL JOIN
you need to do something like this:
WITH
xor AS (
SELECT
m1.type AS m1type, m1.name AS m1name, m1.tbl_name AS m1tbl,
m2.type AS m2type, m2.name AS m2name, m2.tbl_name AS m2tbl
FROM
master1 AS m1
FULL JOIN
master2 AS m2
USING (type, name, tbl_name)
WHERE m1.type IS NULL OR m2.type IS NULL
)
SELECT
coalesce(m1type, m2type) AS type,
coalesce(m1name, m2name) AS name,
coalesce(m1tbl, m2tbl ) AS tbl_name,
iif(m1type IS NOT NULL, 'master1', 'master2') AS instance
FROM
xor
;
(4) By ddevienne on 2023-01-13 15:59:16 in reply to 1 [source]
Thanks for you input David and Jay. I ended up staying with EXCEPT+UNION,
because it works well with arbitrary queries, since I don't need to know the select-clause.
So instead of the previous query with 4x repeated chunks, I use a helper
const bool has_rows = queryXorHasRows(twin_db, R"sql(
select type, name, tbl_name
from {0}.sqlite_master
where name not like 'sqlite%'
)sql"
);
And that helper uses the elemental query to build a larger query. E.g.:
bool queryXorHasRows(lt::Connection& twin_db, const std::string& sql_format) {
BOOST_REQUIRE(sql_format.find("{0}") != std::string::npos);
auto lhs_sql = fmt::format(sql_format, "main");
auto rhs_sql = fmt::format(sql_format, "upgr");
auto stmt = twin_db.prep(R"sql(
with
lhs as (
{}
),
rhs as (
{}
)
select * from lhs
except
select * from rhs
union all
select * from rhs
except
select * from lhs
)sql",
lhs_sql, rhs_sql
);
const bool has_rows = stmt.step();
return has_rows;
}
The Xor helper is also used for other queries to check table structures, indexes, foreign-keys, etc...
And each time I provide only the elemental query (typically a join between sqlite_master
and a pragma_function()
)
with the placeholder (named {0}
) for the name of the DB, since my use-case is comparing two DBs.
My unit tests expects no selected rows from these Xor queries,
since that would imply differences between brand-new and upgraded DBs.
The trick is to select only the subset of columns that is order-independent,
as the order of columns might not be the same because of an ALTER TABLE ADD COLUMN
always being at the end, while the column might be in the middle in a brand-new DB.