You may also like to look here: https://en.wikipedia.org/wiki/UTF-8 UTF-8 is just one way of encoding Unicode code-points for storage. There are others, but its main advantages are: 1) Variable length units from one to four bytes. One byte units map exactly to ASCII. 2) As its byte oriented it has no endianness issue. 3) It's the most-used encoding.