Best way to XOR two tables?

(1) By ddevienne on 2023-01-12 14:26:34 [link] [source]

I'm using the longuish query below. Is there a better way? There's lot of repetition in that query, which I find unappealing.

I guess two CTEs for the LHS and RHS tables might make it smaller syntactically, but would that yield the same plan?

I must use 3.36, so no access to newer right and full-outer joins, although a better query using the latest would also be interesting.

PS: Of course, replace {0} below with the name of an attached DB.

    SELECT type, name, tbl_name
      FROM sqlite_master
     WHERE name NOT LIKE 'sqlite%'
    EXCEPT
    SELECT type, name, tbl_name
      FROM {0}.sqlite_master
     WHERE name NOT LIKE 'sqlite%'

    UNION ALL

    SELECT type, name, tbl_name
      FROM {0}.sqlite_master
     WHERE name NOT LIKE 'sqlite%'
    EXCEPT
    SELECT type, name, tbl_name
      FROM sqlite_master
     WHERE name NOT LIKE 'sqlite%'

(2) By David Raymond (dvdraymond) on 2023-01-12 14:40:28 in reply to 1 [link] [source]

Maybe something like this? (not tested)

select type, name, tbl_name from
(
    select type, name, tbl_name
    from sqlite_master
    where name not like 'sqlite%'
    
    union all
    
    select type, name, tbl_name
    from {0}.sqlite_master
    where name not like 'sqlite%'
)
group by type, name, tbl_name
having count(*) = 1
;

(3) By JayKreibich (jkreibich) on 2023-01-12 17:01:57 in reply to 1 [link] [source]

If you can't do a FULL JOIN, just union together two LEFT JOINs.

For example:

CREATE TABLE master1
(
  type TEXT,
  name TEXT,
  tbl_name TEXT
);

CREATE TABLE master2
(
  type TEXT,
  name TEXT,
  tbl_name TEXT
);

INSERT INTO master1 VALUES ( 'table', 't1', 't1' );
INSERT INTO master1 VALUES ( 'table', 'tx', 'tx' );

INSERT INTO master2 VALUES ( 'table', 't2', 't2' );
INSERT INTO master2 VALUES ( 'table', 'tx', 'tx' );

SELECT *, 'master1' AS instance
FROM
  master1
LEFT JOIN
  master2
  USING (type, name, tbl_name)
WHERE master2.type IS NULL

UNION ALL

SELECT *, 'master2' AS instance
FROM
  master2
LEFT JOIN
  master1
  USING (type, name, tbl_name)
WHERE master1.type IS NULL
;

The output of this would look something like this:

type   name  tbl_name  instance
-----  ----  --------  --------
table  t1    t1        master1 
table  t2    t2        master2

Having access to FULL JOIN doesn't make the query significantly simpler if you want the output to not be split across multiple column sets. To get the same output using FULL JOIN you need to do something like this:

WITH
  xor AS (
    SELECT
      m1.type AS m1type, m1.name AS m1name, m1.tbl_name AS m1tbl,
      m2.type AS m2type, m2.name AS m2name, m2.tbl_name AS m2tbl
    FROM
      master1 AS m1
    FULL JOIN
      master2 AS m2
      USING (type, name, tbl_name)
    WHERE m1.type IS NULL OR m2.type IS NULL
  )
SELECT
  coalesce(m1type, m2type) AS type,
  coalesce(m1name, m2name) AS name,
  coalesce(m1tbl,  m2tbl ) AS tbl_name,
  iif(m1type IS NOT NULL, 'master1', 'master2') AS instance
FROM
  xor
;

(4) By ddevienne on 2023-01-13 15:59:16 in reply to 1 [source]

Thanks for you input David and Jay. I ended up staying with EXCEPT+UNION,
because it works well with arbitrary queries, since I don't need to know the select-clause.

So instead of the previous query with 4x repeated chunks, I use a helper

    const bool has_rows = queryXorHasRows(twin_db, R"sql(

  select type, name, tbl_name
    from {0}.sqlite_master
   where name not like 'sqlite%'

)sql"
    );

And that helper uses the elemental query to build a larger query. E.g.:

bool queryXorHasRows(lt::Connection& twin_db, const std::string& sql_format) {
    BOOST_REQUIRE(sql_format.find("{0}") != std::string::npos);

    auto lhs_sql = fmt::format(sql_format, "main");
    auto rhs_sql = fmt::format(sql_format, "upgr");

    auto stmt = twin_db.prep(R"sql(

with
lhs as (
{}
),
rhs as (
{}
)

select * from lhs
except
select * from rhs

union all

select * from rhs
except
select * from lhs

)sql",
        lhs_sql, rhs_sql
    );
    const bool has_rows = stmt.step();
    return has_rows;
}

The Xor helper is also used for other queries to check table structures, indexes, foreign-keys, etc...
And each time I provide only the elemental query (typically a join between sqlite_master and a pragma_function())
with the placeholder (named {0}) for the name of the DB, since my use-case is comparing two DBs.

My unit tests expects no selected rows from these Xor queries,
since that would imply differences between brand-new and upgraded DBs.
The trick is to select only the subset of columns that is order-independent,
as the order of columns might not be the same because of an ALTER TABLE ADD COLUMN
always being at the end, while the column might be in the middle in a brand-new DB.