2

Here's the scenario: We have a NAS with datasets that need to be copied to a local disk for faster processing. Datasets are from 2 to 15GB and each dataset in its own folder on the NAS.

To copy to the local disk, I call:

FileUtils.copyDirectory(nasDir, localDiskDir);

Where the two parameters are File instances. The nasDir is a network-mapped SMB drive. When using Java to copy the dataset, the max transfer speed tops at about 8MB/s. The same copy using Windows Explorer or Nautilus, depending on the server, reaches up to 34-35MB/s sustained.

Does anyone have an idea of why that is, and, cherry on the cake, how to copy a directory through java faster? Even if we're 5-10% slower than native would be acceptable, the current difference, though, indicates a significant performance degradation somewhere.

EDIT: initially thought it may be related to the Apache Commons I/O library, but testing with https://docs.oracle.com/javase/tutorial/essential/io/examples/Copy.java reveals it to be a more fundamental problem at some level.

Jon_C
  • 21
  • 6
  • It'll be hard to reproduce this, due to the non-trivial setup. I had a short look at the `copyDirectory` method, but could only guess what might cause the low performance in the given setup. There is an example at https://docs.oracle.com/javase/tutorial/essential/io/walk.html , namely this one: https://docs.oracle.com/javase/tutorial/essential/io/examples/Copy.java , which offers a similar functionality like the `FileUtils`, but is built on the (modern) standard API - how does this perform for you? – Marco13 May 08 '18 at 22:26
  • @Marco13 I ran the Copy class on one of our datasets to copy it locally to my dev machine from the NAS and it's copying at a sustained 12MB/s. The exact same copy (same dataset, same machine endpoint after cleaning) through Nautilus runs at a sustained 60MB/s, so it looks like the issue is **not** related to the Apache commons I/O, will rename the issue if I figure out how. Thanks for the pointer to the test class from the Oracle API. – Jon_C May 09 '18 at 08:36
  • Again, it's hard to reproduce without a NAS available, but as a first diagnostical step, it could be interesting to see whether a *single* `Files.copy` or [Apache `copyFile`](https://commons.apache.org/proper/commons-io/javadocs/api-2.5/src-html/org/apache/commons/io/FileUtils.html#line.1070) shows the same performance problem - just to roughly figure out whether there is a problem with the directory traversal, or with copying the actual contents. The latter is hard to imagine, because the methods generally use rather low-level IO-operations under the hood, but ... who knows. – Marco13 May 09 '18 at 11:58
  • So I have a different aspect in my module where I basically traverse the folders and scan files comparing timestamps with what I have in memory to figure out if a file in the dataset has been updated, and that clearly functions as intended. When I then read the metadata in the header of each file is when I incur a performance hit... I'll see if a local file copy of a dataset also suffers the same speed performance. to rephrase the question as needed. – Jon_C May 09 '18 at 13:50
  • On a single large file (0.6 GB) the copy using `cp` takes 12 seconds. Using the `java Copy` it takes 48 seconds. On the local filesystem I'm not sure it can be attributed to be working in a measurably different way, as the `cp` call takes 1 second and the `java Copy` takes 2. It could easily be attributable to initialization time. However the network-mounted path is using the same OS API as far as I can tell. So there is still something significantly slower with the java file transfer rate itself, not the directory traversal, @Marco13 – Jon_C May 09 '18 at 14:18
  • And really, it doesn't need to be a NAS on the other end - I'm guessing that any SMB/CIFS mounted network drive between two computers would exhibit the same behavior. The transfer rates may differ, but the problem seems to be in how the JVM is accessing the file which is suboptimal for networked copy when it's assuming it is a locally mounted file. – Jon_C May 09 '18 at 14:24

0 Answers0