-1

I want to find the fastest way to read a file in remote machine.

The way of remote I/O which I want

enter image description here

The requirements are:

  1. Machine B read a file(or a page) from Disk A in Machine A.
  2. For fast transmission, overlapping DiskI/O and Network I/O.
  3. The file size is a page (maybe it is 64KB ~ 4MB)
  4. I don't want to use NFS, FTP or something like that.

Environment are:

  1. I have 100 machines which are in same room.
  2. All machines are connected by Infiniband(Bandwidth: 1GB/s ~ 1.5GB/s), so I can use RDMA!
  3. Each machine has Intel PCI-E SSD(sequential read bandwidth 1.0GB/s ~ 1.5 GB/s)
  4. Operating system is CentOS 6.4

Is there any library or implementation way? I heard about MPI I/O but I don't understand exactly what it is.

Please help me. Thank you

  • How distant is the remote machine? Same room? Same building? Continental? What is your hardware budget? – Basile Starynkevitch May 26 '16 at 10:44
  • 1
    It will depend, to some extent, on the size of the file. If it is small, NFS may be fastest. If it is large, FTP may be fastest. – Mark Setchell May 26 '16 at 10:49
  • Please **edit your question** to improve it. See my previous comment. Give more details: how far are the two computers? What exact network connection? What operating systems? What kind of computer& *hardware*? – Basile Starynkevitch May 26 '16 at 11:17
  • Connection latency matters, too. Some transfer protocols don't handle high-latency connections very well. Unfortunately, TCP is one of those. – Andrew Henle May 26 '16 at 11:55
  • Sorry for my late. To Basile Starynkevitch: I have 100 machines and they are conneted by infiniband(40Gbit) and are in the same room. I will modify my question. Thanks! To Mark Setchell: I think the size of file is 64KB~4M. But I don't want to use NFS or FTP because it is not my purpose. – Kyungjun Lee May 27 '16 at 02:39
  • Just to clarify. The transmission is initiated by Machine A? Does Machine B already expect the data? – Zulan May 27 '16 at 22:41
  • Zulan: Yes. First, Machine B send a request for data what Machine B wants. And then, Machine A will send the data to machine B. What I want to know is how could I transfer the data in fast way. – Kyungjun Lee May 28 '16 at 06:14

1 Answers1

-1

You usually should not care. The (hardware) network would always be the bottleneck. Try to have the fastest network hardware you can afford.

Most network (Ethernet, wired) connections are 1Gbits/second (and these are the cheap ones). You'll get that speed only if both computers are physically near (same room or building), and on a network which is not loaded otherwise.

You might spend several hundreds of euros or US$ to buy a 10Gbits/sec Ethernet card. You'll need two (one on each side). And even such a fast Ethernet is much slower than disk or SSD ...

No software solution is able to avoid the network bottleneck. A standard, well configured, ftp connection is able to saturate (within a few percent) the network. Local file data is often in the page cache (so having a bit more RAM could slightly help performance).

MPI is not relevant in your case.

If you can afford it, buy two 10Gb/s Ethernet cards and use SSDs. Software does not matter much. The hardware is the bottleneck.

PS. Of course they are exceptions (e.g. transferring a file to a robot on Mars). But you still need to explain more your constraints in your question.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 1
    I disagree with this kind of simplification. The answer makes assumptions that are not specified by the OP. There are many cases where software matters. Consider an intercontinental file transfer over a 10 Gbps path that is part of the internet. Or a file transfer using an 128 Gbps HPC interconnect. Or a connection that is riddled by firewalls and security restrictions. Or a very unreliable interconnect... in any of those cases **software matters alot** and `ftp` will not cut it. – Zulan May 26 '16 at 16:02