Problem
I have a zip-file that I would like to unzip on Ubuntu with the correct filenames (they contain æ,ø,å).
What I have tried:
1. Unrar in Windows 10 - WORKS!
Everything works as expected and filenames are correct.
2. Unzip in Ubuntu
unzip file.zip
The characters æ,ø and å are missing from the filenames, where 'æ' has been replaces with 'C'.
I attempt to detect the encoding of the zip-file, but it doesn't seem to tell me anything.
file file.zip
3. Unzip with encoding in Ubuntu
I attempt to unpack the file using various encodings that are often used for æ,ø,å-containing texts.
unzip -O UTF-8 file.zip
unzip -O ISO-8859-1 file.zip
unzip -O windows-1257 file.zip
None work...
4. Unzip using 7zip in Ubuntu
It is suggested that 7zip may fix the problem, but no..
7z x file.zip
5. Unzip using 7zip and danish language setting in Ubuntu
It is suggested that I change the ubuntu language settings and then try again.
saveLang=$LANG
export LANG=da_DK
7z x file.zip
export LANG=$saveLang
This also does not work.
6. Unzip using Python3 in Ubuntu - WORKS!
The unzip works correctly if I use Python3 for the purpose, but there must be an easier way?
import zipfile
with zipfile.ZipFile('file.zip', "r") as z:
z.extractall("/home/xxxx/")
7. Next step
I am considering finding a list of "ALL" encodings, and then just extracting the filenames and going through them manually. Something along the line of this...
while read p; do
echo "$p"
unzip -j -O $p file.zip
done <encodings.txt
Conclusion
Windows and Python3 seems to have some MAGIC under the hood that I cannot replicate. Do you guys have any suggestions to what this "MAGIC" is?
- How do I identify the encoding of the filenames of a zip-file?
- Where can I get a list of all encodings for step 7.
- Is there any easy way to solve this problem without having to write e.g. a python script?