7

I want to determine whether or not a file is located on a local hard drive or a drive mounted from the network in OSX. So I'd be looking to produce code a bit like the following:

file_name = '/Somewhere/foo.bar'
if is_local_file(file_name):
    do_local_thing()
else:
    do_remote_thing()

I've not been able to find anything that works like is_local_file() in the example above. Ideally I'd like to use an existing function if there is one but failing that how could I implement it myself? The best I've come up with is the following but this treats mounted dmgs as though they're remote which isn't what I want. Also I suspect I might be reinventing the wheel!

def is_local_file(path):
    path = path.split('/')[1:]
    for index in range(1,len(path)+1):
        if os.path.ismount('/' + '/'.join(path[:index])):
            return False
    return True

I have two functions which generate checksums, one of which uses multiprocess which incurs an overhead to start off with but which is faster for large files if the network connection is slow.

msw
  • 42,753
  • 9
  • 87
  • 112
redrah
  • 1,204
  • 11
  • 20
  • What operating system? What kinds of strings do you expect, are they URLs or references to paths on possibly mounted filesystems? – Ivo Jul 25 '12 at 11:39
  • 2
    Take a look at os.split(), os.splitext() and os.sep, they may be helpful for your present code (not directly addressing your question) – Levon Jul 25 '12 at 11:41
  • 3
    How would you feel about a file on a filesystem mounted from a disk image that is itself found on a remote filesystem? – SingleNegationElimination Jul 25 '12 at 11:42
  • Sorry, those should have been `os.path.split()` and `os.path.splitext()` -- missed my edit window before I could add the missing `path`. – Levon Jul 25 '12 at 11:46
  • @Ivo I'm working with paths like the one in the example code, specifically they'll be coming from wxFileDialogs so typically I'd be looking at somehting like /Volumes/RemoteDrive. – redrah Jul 25 '12 at 11:50
  • 1
    @TokenMacGuy Good question, I'm beginning to appreciate why it's not a built in function of Python, there are a lot of ifs and buts. Hmm, may have to have a rethink on this one. – redrah Jul 25 '12 at 11:53
  • 2
    The filesystem goes to great lengths to present the abstraction that there is a single file namespace. You don't say what you are intending to accomplish, but if you break that carefully constructed homogeneity, you risk preventing me from doing something that I might really want to do. Sure, it is stupid for me to mount a swapfile across SMB/NFS/AFS, but the system can't properly guess that I'm doing something Wrong. – msw Jul 25 '12 at 12:00
  • @msw: still, knowing whether FS operations will cause network traffic may be useful when optimizing I/O-heavy software. – Fred Foo Jul 25 '12 at 12:03
  • @larsmans true, but you could also say the same about my using the slower of two local drives I have available. That's why partial solutions must be less than satisfying. – msw Jul 25 '12 at 12:07
  • Interesting question. If i do mount my home on a cluster resource to a subdir of my local home (on a netbook) via sshfs, i can tell the difference between local and remote files from their UID and GID. Does that give you any useful direction? – Klaus-Dieter Warzecha Jul 25 '12 at 12:10
  • @msw @larsmans Good points, I suppose what I'm doing is optimization. I have two functions which generate checksums, one of which uses `multiprocess` which incurs an overhead to start off with but which is faster for large files if the network connection is slow. – redrah Jul 25 '12 at 12:16
  • @redrah: you might want to investigate whether threads provide any speedup. Python threads cannot use multicore processors, but one of them can do work while others are waiting for I/O. – Fred Foo Jul 25 '12 at 12:23

2 Answers2

2

"I have two functions which generate checksums, one of which uses multiprocess which incurs an overhead to start off with but which is faster for large files if the network connection is slow."

Then what you're really looking for is_local_file() to tell you is only a proxy measure for "will file access be slower than I'd like?". As a proxy measure, it is a relatively poor indicator of what you really want to know for all the confounding reasons noted above (local but virtualized disks, remote but screamingly fast NAS, etc.)

Since you are asking a question that is nearly impossible to answer programatically, it is better to provide an option, as with the -jobs option on make which explicitly says "parallelize this run".

msw
  • 42,753
  • 9
  • 87
  • 112
1

You could use your existing code (or try the solution at How to find the mountpoint a file resides on?) to find the mountpoint of the file; then read /proc/mounts to find the device and filesystem; /proc/mounts has format

device mountpoint filesystem options...

You can use the filesystem field to automatically exclude known network filesystems e.g. afs, cifs, nfs, smbfs. Otherwise you can look at the device; as a basic heuristic, if the device is a device node (stat.S_ISBLK) or none then the filesystem is probably local; if it is in URI style (host:/path) then it is probably remote; if it is an actual file then the filesystem is a disk image and you'll need to recurse.

Community
  • 1
  • 1
ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • 1
    Doesn't `/proc/mounts` have a filesystem type column on the Mac? On Linux it does. (Which is still not 100% reliable, because `fuse` filesystems may be local, remote, or even in-memory.) And btw., it's cleaner to check whether a path refers to a block device than to check whether it's in `/dev`. – Fred Foo Jul 25 '12 at 12:27