1

We are facing a performance issue while serialising file by multiple processes.

Here is the issue: We are creating multiple processes (in distributed computing env) to do our calculation and then write down output of each processes in a file. Main process use these files, merge it in memory and do the further calculation.

We have limited no of servers. What happens is 2-3 processes can be created in same server. When this happens (2-3 processes created in same server), we are having a scenario when those processes try to write\serialize computed file (file size is approx 80-90 MB) on the disk at the same time. When this happens, it takes around 3m to serialise the file. In normal case, that 80-90 MB takes only 30s.

We monitored this scenario in performance monitor (and our log timings) and could see because of 2-3 processes trying to write at the same time, it takes around 6 times longer than normal timing.

Any suggestion to improve timings (of 3mins scenario) is appreciated.

We use .Net framework and code is written in c#.

code4fun
  • 11
  • 2
  • Why don't you pass the object in-memory to the other processes/server e.g. with a message bus. Maybe the IO subsystem is limiting your performance (running anti-virus-software, slow storage)? – Andreas Jun 16 '14 at 11:17
  • Why are you creating multiple processes? Is it nessacery to process each calculation in a different one? – Yuval Itzchakov Jun 16 '14 at 11:19
  • Maybe you could use Mutex Class to prevent other process write on the same file simultaneously http://msdn.microsoft.com/en-us/library/System.Threading.Mutex(v=vs.110).aspx – Ricky Stam Jun 16 '14 at 11:21
  • @RickySam i dont think the problem is writing to the *same* file. It sounds like each process has its own file output. The problem is simultaneous I/O writes – Yuval Itzchakov Jun 16 '14 at 11:23
  • @Yuval Itzchakov, yes. that's correct. Issue is with simultaneous I/O writes. We were running out of memory exception in one process and we have to use this approach. – code4fun Jun 16 '14 at 15:01
  • @Andreas, anti-virus doesn't seem to be an issue as per Performance Monitor. I will investigate for slow storage. Thanks – code4fun Jun 16 '14 at 15:06
  • You could also compress the stream before writing the serialized data to disk. With the cost of a few cpu cycles you could save quite a lot of hdd bandwidth. – Andreas Jun 16 '14 at 21:50
  • Thanks a lot for your comments. Was bot bit busy with few releases so couldn't look into your helpful comments. To resolve above, we changed design slightly to give writing responsibility to one service which takes care of this and makes sure above scenario doesn't happen. We still have an issue with reading same files with multiple processes. To describe that shall I create another Query? – code4fun Jul 11 '16 at 10:59
  • Sorry I meant, couldn't respond to your helpful comments. :) – code4fun Jul 11 '16 at 11:06

1 Answers1

-1

You can try forcing the processes to write into different files and then just read all files in a folder. For example you can have the following structure

|-C:\experiments\current
|--- output_{UNIQUE_SUFFIX}.bin 
|--- output_0.bin
|--- output_1.bin
|--- output_nwvpqnfj.bin
|--- output_jhfjqhfew.bin
oleksii
  • 35,458
  • 16
  • 93
  • 163
  • Thanks for your answer. As I mentioned above, we found an issue with our design and we resolved that. – code4fun Jul 11 '16 at 11:07