Here is a snippet from data encoding in R memory. The CSV file was read with encoding "Latin-1" using data.table::fread
.
As this piece suggests, the data is stored with different encodings, which is not desirable because I'll leave the data in a SQLite database, so whenever I send data to database and call it back, Latin-1 is not read in appropriately. Is there a way to normalize this?
It seem that common functions like iconv
won't work, once the data have multiple encodings in different parts of the data.frame.
Encoding(Data$DESC)
[5305] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[5311] "unknown" "unknown" "unknown" "latin1" "unknown" "unknown"
[5317] "unknown" "latin1" "latin1" "latin1" "latin1" "unknown"
[5323] "latin1" "latin1" "latin1" "latin1" "unknown" "latin1"