I have a bunch of CSV files that I read and plot with python and pandas.
To add some more information about the file (or rather, the data it is about) into my plots, I am analyzing their headers, to extract various things from it (location of the measurement point, type of measurement etc.).
Problem is - the files are in German and thus contain a lot of umlauts (ü, ö, ä). Now I can read and understand them perfectly fine, but my script can't.
So I want to simply replace them with their valid 2 character representations (ü=ue, …), so that I dont have to worry about using things like u'Ümlautstring'
or \xfcstring
in python.
sed -i 's/\ä/ae/g' myfile.csv
should do the trick, according to google, but it doesnt work.
With some further resarch, I found the issue, but no solution:
My csv files are encoded in ISO 8859-15
, but my locale
is LANG=de_DE.UTF-8
, which, as far as I understand it, means that sed searches for ü
in its utf 8 form, which it will not find in ISO 8859-15.
So what do I have to tell sed to find my umlauts?
Most things I have found so far suggest Perl, but that is not really an option.