SQLite Forum

Should tables be populated based upon a known/expected pattern of retrieval?
Login
I'm trying to compare two sources of data that are supposed to each contain at least one column of identical data (the same original source from which each claims to have been constructed) and be very close otherwise. 

I unpacked the ugly data blocks from each source into two separate tables, in which there are about 500,000 rows of 10 columns, where no column is over 50 bytes.

If everything were perfect, there'd be an equal number of rows and one identical column, such that the two tables could be joined on that column to add the other columns from table 2 into table 1.

Since that isn't the case and I cannot make it so without adding my "opinion" to one or both of the data sources, I have a question of whether the two tables should remain separate or combined vertically since the layouts are identical.

I read here before that, if the table structures are the same, there's no good reason to have two tables instead of one with an added identifier.

If that is true, I've another question.  Even though the tables don't match exactly in a one-to-one correspondence, each has the exact same number of divisions in it, where a division is generally between 2 and 50 rows of data.  Since almost every case of examining this data will be to compare the same division from each table, should the two tables be combined in that manner?

If table 2 is simply appended to table 1, when two divisions are compared, n number of rows will be retrieved from the top of the table and then n or close to n rows will be retrieved about 500,000 rows lower down in the table.

I'm writing way beyond my depth of understanding here but should the final combined table be built ordered by table 1 block 1, table 2 block 1, table 1 block 2, table 2 block 2, ..., such that both blocks may more likely be on the same page making the retrieval query more efficient?

Also, if this data is to be made accessible through a GUI that permits the user to edit the data, which is done by adding a new row to the table with an incremented version number, such that the original is always preserved, does that remove any efficiency gained by ordering the table by division?

Thank you for considering my very novice question.