file encoding to utf-8 from iso-8859

Asked Nov 25 '20 at 18:52

Active Nov 26 '20 at 21:10

Viewed 315 times

input file with 50k plus unix file command shows it as ISO-8859 text, with very long lines

input record causing issue MONTRÃ©AL

when i use iconv command like below nothing changes, record is as-is


**iconv -f ISO-8859-1 -t UTF-8 input.txt -o output.txt**

when i copy the specific record in question using sed command, file is created as utf-8 and the record looks good

**MONTRéAL**
**sed -n '41696p' input.txt > output.txt**

when i copy from 1 through 41696 with the same sed command, record didn't change

**sed -n '1,41696p' input.txt > output.txt**

how do I copy the file from iso-8859 to utf-8 with proper characterset??

edited Nov 26 '20 at 21:10

Talha

asked Nov 25 '20 at 18:52

Krrp78

1

What are you using to view the record to see whether it's "good" or not? One theory is that the file is already using the utf-8 encoding, but whatever you're using to view it thinks it's ISO-8859 because it's not looking deep enough into the file to know that it's not. That would explain why it looks okay when you just have the one line. What happens if you do, say, `sed -n '41600,41696p' input.txt > output.txt` ? – jas Nov 25 '20 at 20:03
i just use vi editor to view the file – Krrp78 Nov 25 '20 at 20:08

0 Answers0