1

I need to transfer huge amount of data between Java and C++ programs under Linux(CentOS). Performance is the first concern. What will be the best choice? RAMDisk (/dev/shm/) or local socket?

avhacker
  • 667
  • 1
  • 9
  • 20
  • 1
    Have you run any tests? – Andrew Barber Jan 06 '14 at 06:32
  • 1
    /dev/shm isn't a ramdisk, A ramdisk is a piece of memory treated like a disk drive. /dev/shm is shared memory. – paxdiablo Jan 06 '14 at 06:36
  • OK, /dev/shm is not ram disk but it's quite similar -- at least similar performance. My case is that there are 16 client that will generate a lot of small files (smaller than 1MB) and send these files to a server (client and servers are in the same host). The total amount of data will be about 1TB. If I use /dev/shm for IPC, the files will be written in a sub folder and when the folder is greater than 100MB, client will create another folder and write new data file into new folder. When a folder is full (over 100MB), client will notify server to process it. – avhacker Jan 06 '14 at 10:02
  • ok, 16 clients, writing to /dev/shm V.S. sending through socket. It surprise me that the socket version is about 10% slower than /dev/shm version (120min for socket, 110min for /dev/shm). Another thing to notice is that server will consume high CPU power to process the input data. – avhacker Jan 06 '14 at 10:06
  • @avhacker no surprise - shared memory requires less copying operations and only basic synchronization between sender and receiver. – Eugene Mayevski 'Callback Jan 06 '14 at 10:36
  • what about using a named pipe? – avhacker Jan 06 '14 at 11:55

1 Answers1

3

A socket is fastest because the other end can start processing the data (on a separate cpu core) before you have finished sending data.

Say you're sending 100KB of data, the other end can begin processing as soon as it recieves a couple of kilobytes. And by the time all 100KB has been sent, it has probably finished processing 90KB or thereabouts, so it only has 10KB left.

While with a RAM disk, you have to write the entire 100KB before it can even start processing data. Making it about 10x faster to use a socket than a ram disk, assuming both ends need to do about the same amount of work.

Maybe it takes 1 millisecond to write 100KB to a RAM disk and then 1 millisecond to process it. With a socket it would take 1 millisecond to send the data but only 0.1 millisecond to finish processing after all the data has been sent.

The larger the amount of data being sent, the bigger the performance gain for sockets. 10 seconds to write all the data, and another 0.1 millisecond to fnish processing after all data has been sent.

However, a RAM disk is easier to work with. Sockets use streams of data, which is more time consuming in terms of writing the code and debugging/testing it.

Also, don't assume you need a ram disk. Depending on how the operating system has been configured writing 100MB to a spinning platter hard drive might simply write it to a RAM cache and then put it on the hard drive later on. You can read it from the temporary RAM cache immediately without waiting for the data to be written to the HDD. Always test before making performance assumptions. Do not assume a HDD is slower than RAM, because it might be optimised out for you silently.

The mac I'm typing this on, which is UNIX just like CentOS, currently has about 8GB of RAM dedicated to holding copies of files it guesses I'm going to read at some point in the near future. I didn't have to create a RAM disk manually, it just put them in RAM heuristically. CentOS does the same sort of thing, you have to test it to see how fast it actually is.

But sockets are definitely the fastest option, since you do not need to write all the data to start processing it.

Abhi Beckert
  • 32,787
  • 12
  • 83
  • 110
  • Not necessarily, You can divide the shared memory into chunks as well. You can even do it like a socket if one of the things you share are read and write pointers (protected with a cross-process semaphore of course). – paxdiablo Jan 06 '14 at 06:38
  • @paxdiablo true. there are many ways to skin a cat. I assumed by "RAM Disk" he meant to create an ext filesystem in RAM and mount it, then write the entire chunk of data and provide the path to the other process, which is a simple and effective way to send data if you don't want to spend the time learning more complex aproaches. – Abhi Beckert Jan 06 '14 at 07:01