SQLite Forum

Is CSV parsing too liberal?
Login
As you say, it is sort of a system convention, but not universal. In one sense, stripping out the BOM when reading a known to be text file is a fairly safe operation, as there really is no reason for one to be at the beginning of a file except to indicate it is UTF-8 encoded, even if by the Unicode Standard it isn't supposed to be used that way.

The one big problem with that convention is that it says that utilities that were designed to work with plain ASCII files, and didn't need to care about encodings (like tail) suddenly are now broken as they don't know to do this.

The advantage of this convention (and somewhat why it happens a lot in Windows) is that for programs that DO need to worry about encoding, it provides a big clue of UTF-8 vs local default 8-bit code page.

Windows has a bigger problem with this as it is older, especially in the 'business world where this was more of an issue.

Linux was late enough to be able to just assume UTF-8 unless it was told otherwise, and could live with those issues.