Remove duplicate rows?
(1.1) By Gilles on 2022-08-20 14:48:59 edited from 1.0 [link] [source]
Hello,
Out of curiosity, is there a simpler way to remove duplicate rows?
CREATE TABLE tmp_books AS SELECT DISTINCT * FROM books;
DROP TABLE books;
ALTER TABLE tmp_books RENAME TO books;
CREATE INDEX title_idx ON books(title);
Thank you.
(2) By anonymous on 2022-08-20 13:19:31 in reply to 1.0 [source]
delete from books where rowid not in
(select min(rowid) from books
group by title,
/* add the rest of the column names. no, you can't use * here. */
);
(3) By Gilles on 2022-08-20 14:49:21 in reply to 2 [link] [source]
Thank you.
(4) By anonymous on 2023-10-02 16:00:21 in reply to 2 [link] [source]
The same Code lefts one duplicate in my table. Any suggestions?
(5) By Spindrift (spindrift) on 2023-10-02 18:30:02 in reply to 4 [link] [source]
The only sensible suggestion is that it is not in fact a duplicate.
You may need to furnish us with more information.
Of course if it is really a single duplicated row, you could just remove one of them directly.
(6) By anonymous on 2023-10-02 22:57:14 in reply to 5 [link] [source]
The usual cause of confusion is that a value which looks like a number is actually a string. To unconfuse, select a quoting output mode.
(7) By Simon Slavin (slavin) on 2023-10-03 16:45:21 in reply to 4 [link] [source]
Try returning typeof(field)
. If they are both strings, make sure there are no spaces or returns at the beginning or end of the string.
(9.1) By Trudge on 2023-10-04 13:47:37 edited from 9.0 in reply to 4 [link] [source]
Whenever I'm dealing with output (string or numeric), I sometimes bookend the output with '-->' and '<--' so I can see if there are any leading or trailing spaces or other characters.
So in Perl I would say
print qq{-->$ThisString<--};
This shows me immediately any leading or trailing unwanted characters.
In the case of thousands of lines to verify / sanitize, I would output the whole result to a text file then use the appropriate tool to look for and make any changes (diff, sed, awk, perl, etc.).
(8.1) By punkish on 2023-10-03 19:11:56 edited from 8.0 in reply to 1.1 [link] [source]
while the other answers are great, I see nothing wrong with the above approach… it seems just as simple as any other way. Unless you have bazillions of rows and so, are resource-constrained, the above is as good a solution as any. And, you don't have to name all the columns individually.