1

I was trying to transfer large number of data (long int arrays) from multiple (8) remote computers to a single computer(main process). All these are connected via a 100 MBps LAN and are identical machines(so no worry about endianess).

Each remote machine has an array of 8GB long int's and I have to transmit it to the single computer for processing. My question is what is the best way to transfer these arrays quickly to the main process . I tried using traditional TCP to do this job and it takes a lot of time for transferring the data (about 28 minutes). Is there any way to boost this speed up? . Will switching to UDP help me? Will using multiple ports/sockets help me for buffering? Whats the best approach to solve such problems?

I probably cannot compress the data (as most of them are unique) and I need to send everything (as I carry out important operations in the main process)

chettyharish
  • 1,704
  • 7
  • 27
  • 41
  • 7
    Just a thought - have you tried compression? – Tony Delroy Jul 22 '15 at 07:18
  • Do you really need to transmit all the data at once? Maybe you could store it somewhere (e.g. in some database) and fetch it incrementally by chunks? We can't help if you don't explain what that data really is and where does it come from.... – Basile Starynkevitch Jul 22 '15 at 07:20
  • 5
    Try moving the algorithm to the data instead of moving the data to the algorithm. Maybe map-reduce could help you. – nwp Jul 22 '15 at 07:20
  • 4
    If you do the math, transferring 8GB on an 100mbps network will take at least 11 minutes. Is that acceptable? – Joni Jul 22 '15 at 07:32
  • Just noticed that I did the stupid mistake of typing b instead of B. It is 100MBps Sorry folks. – chettyharish Jul 22 '15 at 18:54

2 Answers2

4

First, upgrade your hardware. With 1GB NIC (or 10GB if you have the budget) and a decent switch you get 10x boost with no coding, transferring 8GB data takes about just one minute. Push it further with NIC bonding you double it again to just 30 seconds (or 60 times faster than your).

Next, adjust your algorithm, do you need to send the whole 8GB data frequently? Can you pipeline it, do in streaming way, or send only diffs (replica), so that you get good data processing throughput?

The last thing you can do is compression, and better do in chunks so that you don't compress the whole 8GB at once.

Non-maskable Interrupt
  • 3,841
  • 1
  • 19
  • 26
3

You can try to compress your array. There are several algorithm you can find and this post may help you. It provides an explanation for the three most known lossless algorithm :
1. Huffman a tree based algorithm it has a lot of applications and specialization
2. RLE for Run-length encoding is well suited for icons compression
3. LZ77 which use a dictionnary and is a basis to a lot of different algorithms

A lossless algorithm is what you need because you don't want to lose the datas in your array. That's why I wouldn't recommend UDP since it does not check if the data has been received.

Community
  • 1
  • 1
Pumkko
  • 1,523
  • 15
  • 19