1

I am trying to copy a file from a source NFS volume to destination NFS volume.

The file name has non-utf8 character and I am using bytes to open/read/write.

Using os.open, the path opens fine on the source , but gives invalid argument error on destination.

Below is the minimal problem example

    >>> import os
    >>> x = b'/x/en/local/noarch/agnostic/docs/FSques\x8awithrepl.doc'
    >>> os.open(x, os.O_RDONLY)
    3
    >>> fd = os.open(x, os.O_RDONLY)
    >>> os.path.getsize(fd)
    37888
    >>>
    >>> y=b'/mnt/x/dest/WAFSquestionnai\x8awithreplies.doc'
    >>> os.open(y, os.O_RDONLY)
    Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
    OSError: [Errno 22] Invalid argument: 
        b'/mnt/x/dest/WAFSquestionnai\x8awithreplies.doc'
    
    >>> import cchardet as ct
    >>> ct.detect(x)
    {'encoding': 'ISO-8859-3', 'confidence': 0.7991858124732971}
    >>>
    >>> ct.detect(y)
    {'encoding': 'ISO-8859-3', 'confidence': 0.8912176489830017}
    >>>


    >>>
    >>> import sys
    >>> sys.getdefaultencoding()
    'utf-8'
    >>>

Why does os.open pass on one and fail on the other? Shouldn't I at least get a FileNotFound error on the destination path?

Brian61354270
  • 8,690
  • 4
  • 21
  • 43
CodeTry
  • 312
  • 1
  • 19
  • 1
    What exact form do you get this filename back in if you run `print(repr(os.listdir('/mnt/x/dest')))`? Just because the character in its name was represented with a specific byte _before_ you copied the file to NFS doesn't mean it didn't get munged. – Charles Duffy Aug 25 '23 at 14:31
  • /mnt/x/dest wont have this file. I am hoping to get a filenotfound error to move forward. – CodeTry Aug 25 '23 at 14:43
  • How will opening with `R_RDONLY` work then? You'd need `O_WRONLY | O_CREAT`. – Charles Duffy Aug 25 '23 at 14:45
  • But anyhow -- try copying the file with other non-Python tools that _do_ work. If you can't find any tools that do work, you know the destination doesn't allow that name and you need to escape it into a different name. If you _can_ find any tools that perform the operation successfully, you can look at `listdir()` to see how they changed the name to _make_ it work. – Charles Duffy Aug 25 '23 at 14:45
  • There are plenty of standards for means to encode arbitrary strings into a 7-bit-clean form. Which one to use in your case depends on your priorities -- if the names don't need to stay human-readable something like base64 with `/`s translated to a different character might do, f/e; I've also used variants on URL encoding in the past. – Charles Duffy Aug 25 '23 at 14:48
  • For a filesystem to be POSIX-compliant the set of characters it needs to support in names is quite small -- see https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_282; support for anything outside that set is implementation-defined and YMMV. So you're not guaranteed that UTF-8 names will work; you're not even guaranteed that all 7-bit or 8-bit ASCII names will work. – Charles Duffy Aug 25 '23 at 14:51
  • (...and because NFS doesn't provide metadata about the filesystems on the endpoints, there's no a priori way to know which names will or won't be allowed without checking in practice, unless you have extra knowledge about what kind of server is running in a given place, what the underlying remote filesystem is, which NFS implementation is in use, etc). – Charles Duffy Aug 25 '23 at 15:06
  • Thanks Charles for these useful info. I am also fairly convinced that it has to do something with underlying storage. The normal cp commands also dont work with invalid argument error neither does C level file open calls. I am wondering if invalid argument error with os.open is raised during some validation/conversion or when its actually calling fs to open the file – CodeTry Aug 25 '23 at 18:07
  • The Python call invokes the C call; if you can't make the low-level call work, the high-level one won't either. – Charles Duffy Aug 25 '23 at 19:40
  • To be clear, what you have is almost certainly the open() syscall returning EINVAL; "Invalid argument" is the correct English to describe EINVAL, and to quote from `man 2 open`, one of the causes of EINVAL is: *The final component ("basename") of pathname is invalid (e.g., it contains characters not permitted by the underlying filesystem).* – Charles Duffy Aug 25 '23 at 19:41

0 Answers0