4

We have different Magento installations (webshop) that allows images to be added to a product freely. When an image is added to a product, the file is named in a specific way that sometimes encorporates special characters (for instance German umlauts).

In the one case I'm currently looking into the filenames are encoded in latin1. I can see that by doing ls into a file, then reading the file via vim. Using the fileencoding=latin1, the umlauts are displayed correctly.

Now, these Magento installations are backed up by tar, 7zip and ccrypt (in that order). Unpacking those on linux gives the same filenames in the same encoding.

We now have a share on a Windows system where we would want to put the untarred Magento installation on. While untarring however, a number of error messages pop up in regard to the umlaut file names:

tar: var/magento_webs/customer/media/import/images/12063-sportsto\337d\344mpfer-hinten.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/15240-kunststoffkotfl\374gel-detail-vorne.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/14300-fl\374gel.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/15240-41kotfl\374gel-kunststoff-vorne.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/citr\366n.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/2cv6-ma\337e-1.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/2cv6-ma\337e.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden
tar: var/magento_webs/customer/media/import/images/11076-vorschalld\344mpfer.jpg: Kann open nicht ausführen: Datei oder Verzeichnis nicht gefunden

(It roughly translates to Cannot execute open: File or directory not found)

Now, looking at the filenames tar lists, I can see that tar seems to try to create UTF-8 encoded filenames (\337 looks that way). However, the mount point has been made available via (from etc/fstab):

//192.168.0.111/share   /mnt/share      cifs    username=myusername,noperm,sec=ntlm,codepage=cp850       0       0

I'm not sure why these file names cannot be written to the share in a fashion that preserves the umlaut encoding. Am I missing another option (is codepage the wrong option for this)?

Edit 1: I can recreate something similar by SSHing into the linux box, setting the Remote character set of the connection to ISO8859-15, changing to the share directory and touching a file with an umlaut:

touch: kann â\244â nicht berÃŒhren: Datei oder Verzeichnis nicht gefunden

(Can't touch X: File or directory not found)

Edit 2: First try of a solution

I'e added iocharset=utf8 to the mount options, remounted the share, but got the exact same problems with the same files. Strangely enough, using mount (which usually prints all the options mount points have been mounted with) doesn't print the iocharset option (neither with utf8 nor with cp850 as the setting).

Dabu
  • 359
  • 1
  • 5
  • 23

1 Answers1

0

Some time in the past (I believe between version 2.0 or so), mount.cifs lost the "codepage=" option and put everything in the "iocharset=" option.

You should be fine with

//host/share /mnt/share cifs username=blah,noperm,sec=ntlm,iocharset=utf8 0 0
Bandrami
  • 893
  • 4
  • 9
  • Sadly, this didn't change anything. I've updated the problem with results and findings. – Dabu Dec 20 '13 at 10:21
  • Oh, right, I hate that. You also have to have the share offered with UTF-8 enabled. The SMB server itself is Windows, I take it? – Bandrami Dec 20 '13 at 10:25
  • Yes, it's a Windows share. It's on the host, while the linux box is a guest inside VirtualBox. – Dabu Dec 20 '13 at 10:28
  • Also, check that nls_utf8 is either loaded as a module or compiled in on the client (it probably is since you can see the characters, but just to be sure...) – Bandrami Dec 20 '13 at 10:28
  • Loaded are both nls_utf8 and nls_cp850. – Dabu Dec 20 '13 at 10:30