4

I have a bunch of music files on a NTFS partition mounted on linux that have filenames with unicode characters. I'm having trouble writing a script to rename the files so that all of the file names use only ASCII characters. I think that using the iconv command should work, but I'm having trouble escaping the characters for the 'mv' command.

EDIT: It doesn't matter if there isn't a direct translieration for the unicode chars. I guess that i'll just replace those with a "?" character.

zedwarth
  • 63
  • 1
  • 7
  • Language? If this is bash, put it in quotes? If this is some other language, don't call mv, call the proper syscall? – Thanatos Jun 10 '10 at 04:11
  • Also, what are we doing if we find a Unicode character? 火 has no ASCII equivalent. – Thanatos Jun 10 '10 at 04:13

3 Answers3

3

Sometimes mv will not be able to read the filename in a shell, so you can try the inode reference.

To get the inode of a file:

$ ls -il

Output will be something like this:

13377799 -rw-r--r--  1 draco  draco      11809 Apr 25 01:39 some_filename.ext
9340462  -rw-r--r--  1 draco  draco      81648 Apr 23 02:27 some_strange_filename.ext
9340480  -rw-r--r--  1 draco  draco       4717 Apr 23 03:54 yikes__oh_look_a_file_火

Then use find to get your file and perhaps using the python code by Thanatos:

$ find . -inum 9340480 -exec ./unistrip.py {} \;

You could also use the above command with iconv in a shell.

Hope this helps someone out, and excuse me for any mistakes[first answer].

Hefnawi
  • 171
  • 1
  • 5
3

convmv is a good Perl script to convert file name encodings. But it can't handle characters that aren't in the destination encoding.

You can change any character not in ASCII to '?' using the rename utility distributed with Perl:

rename 's/[^ -~]/?/g' *

Unfortunately this replaces multi-byte characters with multiple '?'s. Depending on the Unicode encoding that is used and the characters involved changing the regex may help, e.g.

rename 's/[^ -~]{2}/?/g' *

for 2-byte characters.

Florian Diesch
  • 1,025
  • 10
  • 16
  • This is excellent. Just a heads-up some distros these days repackage the perl rename utility as "prename" because of name conflicts with a different binary called "rename" – BoeroBoy Jun 19 '20 at 13:52
2

I don't think iconv has any character replacement facilities. This in Python might help:

#!/usr/bin/python
import sys

def unistrip(s):
    if isinstance(s, str):
        s = s.decode('utf-8')
    chars = []
    for i in s:
        if ord(i) > 0x7f:
            chars.append(u'?')
        else:
            chars.append(i)
    return u''.join(chars)

if __name__ == '__main__':
    print unistrip(sys.argv[1])

Then call as:

$ ./unistrip.py "yikes__oh_look_a_file_火"
yikes_?_oh_look_a_file_?

Also:

$ mv "yikes__oh_look_a_file_火" "`./unistrip.py "yikes__oh_look_a_file_火"`"

You might test it a bit first. For large move operations, generating a list of mv commands (ie, write code to write a script) is advisable, as you can look over the move commands before telling them to execute.

Thanatos
  • 42,585
  • 14
  • 91
  • 146
  • Correct me if I'm wrong, but I think that `iconv` does have character replacement facilities: http://stackoverflow.com/questions/1975057/bash-convert-non-ascii-characters-to-ascii – B Johnson Jun 10 '10 at 12:17