What would cause column data to be spread across two lines?

(1) By anonymous on 2020-07-24 08:15:08 [link] [source]

I've imported metadata from audio files and written same to a SQLite table.  A number of columns have entries like this "text1\\text2\\text3" where the \\ serves as a delimiter, meaning that the column actually contains multiple distinct entries and should ideally be split into a separate table ... which is what I'm planning to do.

I'm using Python to load the column into a Pandas DataFrame, following which I'm converting all records into a Python list which is then split into individual entries, de-duped and written to a new list, which forms the basis of the separate table.

In doing this I've encountered a number of items in the list that split over two lines.  This led me down the path of running update queries to remove any occurrences of CR and LF in the column in question:

update alib set artist = [REPLACE](artist, char(10), '') WHERE artist != [REPLACE](artist, char(10), '');

and

update alib set artist = [REPLACE](artist, char(13), '') WHERE artist != [REPLACE](artist, char(13), '');

Whilst this reduced the number of occurrences of items in the list that split over two lines, some still remain and I've no idea where to go from here.

Are there any other characters I should be testing for?

(2) By anonymous on 2020-07-24 09:59:15 in reply to 1 [link] [source]

Have you tried examining the raw data (before you start any of your processing) of the affected rows to see what characters they contain? Use the SQLite shell to find out what the stored data is for those rows, so you can be sure if the problem is in the data itself or the way you're processing it.

(3) By Simon Slavin (slavin) on 2020-07-24 12:11:41 in reply to 1 [source]

Many database engines, including SQLite, allow returns as part of text strings. This isn't weird, or even unusual. It happens all the time.

If you are converting the data into a one-row-per-record format then you will have to detect this and deal with it according to however your format copes with expressing returns.

(4) By Keith Medcalf (kmedcalf) on 2020-07-24 19:01:00 in reply to 1 [link] [source]

How about finding one of the "fields" which has the behaviour you are complaining about and take a look at what it contains.

Do you not think that might be more fruitful than asking such foolish questions?