0

I have three applications running on a beefy machine at the same time (32GB memory) and 32 CPUs. The three applications have to run in parallel. Two of these applications are C applications and they are IO intensive (they are constantly writing a large amount of data to the disk). The third application is a java application which basically reads the files written by the first two applications and write different files. I can run each application smoothly without problems but when I try to run the three together, it seems that CPU% usage of the first applications are high and the CPU% usage for the third application shows as 0%. I am using Ubuntu linux. I am trying to give equal process priority to these applications. Any idea what the problem is?

Keeto
  • 4,074
  • 9
  • 35
  • 58
  • 1
    By default they have the same niceness (e.g. priority). How are you safely accessing the file for reading while some other file is writing to it? – Elliott Frisch Jan 14 '14 at 21:04
  • On a side note, writing to files isn't a very good inter-process communication (IPC) mechanism. There are probably much better solutions to whatever problem you're trying to solve. – Chris Hayes Jan 14 '14 at 21:05
  • 3
    Wait, if they are I/O-bound, why do you care about how much CPU time they consume? Shouldn't you be more concerned about whether you're utilizing the (limited) I/O bandwidth efficiently? – NPE Jan 14 '14 at 21:05
  • I produce sequential files, sample0, sample1, and so on....I dont access sample(i) until sample(i+2) exists – Keeto Jan 14 '14 at 21:06
  • @NPE Yes, I want to distribute the IO efficiently among them but it seems the processor is not serving the third application at all, even for the non IO part – Keeto Jan 14 '14 at 21:08
  • maybe this is an obvious question, but: are you falling behind? are the files being created faster than you can process them? (is it possible that the java app is just finishing faster?) – JVMATL Jan 14 '14 at 21:08
  • @JVMATL I have investigated this issue. The JAVA app stops completely when the other two applications are running. When I pause the other two applications (which I can do), the Java app finishes so fast. So the java app performance individually is more than good! – Keeto Jan 14 '14 at 21:13
  • When I say "stops", I mean it is not being scheduled to run by the processor – Keeto Jan 14 '14 at 21:14
  • Does you Java app requires a lock to be obtained to proceed? – PM 77-1 Jan 14 '14 at 21:23
  • Kinda sounds like your logic for not accessing sample(i) until sample(i+2) exists has a problem. Can you insert trace statements there to see if that is where the Java app is stuck? – Ron Burk Jan 14 '14 at 21:25
  • 1
    Have you considered manually tweaking their performance using the "nice" command? Perhaps you can slow down the first apps a bit. This is a hack, of course, but sometimes hacks work well enough. – Ewald Jan 14 '14 at 21:25
  • @PM77-1 No, there are no locks whatsoever. I only wait for the existance of a file to proceed but I am sure this is not the problem. – Keeto Jan 14 '14 at 21:26
  • @Ewald Yes, I inserted some trace outputting and the app doesnt even reach there which assures me that the CPU doesnt schedule the java app at all when the other two apps are running – Keeto Jan 14 '14 at 21:27
  • @Keeto The file will exist for some time until all the bytes are written, unless you're writing in a temp folder and moving the "complete" file into the monitored directory. Are you? – Elliott Frisch Jan 14 '14 at 21:28
  • It sounds like the problems is how your apps are scheduled by the OS, meaning, they don't work as you think they do. Some debugging should show what the problem is – fernando.reyes Jan 14 '14 at 21:28
  • I don't think this is a starvation issue; I think ewald might be onto something; the Java app would not get scheduled to run if the app were stuck in a blocking call - can you show code around where your Java app is waiting for something to do? – JVMATL Jan 14 '14 at 21:28
  • @ElliottFrisch I only access sample(i) if sample(i+2) exists. It doesnt matter if sample(i+2) is not complete because sample(i+2) does not start until after sample(i) is complete – Keeto Jan 14 '14 at 21:30
  • @keeto - here's where we are coming from: Linux does NOT just 'stop' a process because other processes are running. It may sometimes schedule things inefficiently, you may have a resource starvation problem that makes things move slowly, but it is extremely unlikely that the OS is stopping your Java app outright; you say you have traced the java app: what does your trace show? where does it get stuck? Can you make it happen in a debugger? – JVMATL Jan 14 '14 at 21:30
  • @JVMATL I agree with you and thats why I dont understand where the problem is coming from. My guess is that it is a disk starvation problem. The three apps access the disk heavily but the distribution is totally biased to the first two apps – Keeto Jan 14 '14 at 21:33
  • 1
    So the $20,000 question is: where is your Java app's point of execution when you see it not executing for lengthy periods of time? If your theory of disk starvation is correct, then it should always be paused waiting for disk I/O. – Ron Burk Jan 14 '14 at 21:35
  • Are you running some of the apps in the foreground, some in the background, all in the background, or what? – Ron Burk Jan 14 '14 at 21:36
  • 1
    Suggestion: when I run out of clever ideas, I resort to brute force :) Copy your source code off (or check it into source control) and start removing code from your java app. Strip it down, down, down until it's nothing but a loop waiting for the next file to show up and then print a message saying "reading file X" -- do this step by step to get the smallest java program you can that still locks up. In doing this, you will either solve the problem, or have a nice, small bit of code to post so others can look at it and puzzle out where it's getting stuck. – JVMATL Jan 14 '14 at 21:37
  • @RonBurk I am running each app in a different terminal – Keeto Jan 14 '14 at 21:38
  • @JVMATL Thanks. I will probably do that – Keeto Jan 14 '14 at 21:38
  • 1
    Might I humbly suggest http://manpages.ubuntu.com/manpages/raring/man1/cpulimit.1.html as a tool for limiting the CPU usage of the first apps? Make sure that each app is allowed only 1 core, that way the Java app should get a core all of its own. – Ewald Jan 14 '14 at 21:39
  • @ewald cpulimit is a good tool to know about, but he's running on a 32-core beast! Surely at least one of those cores has some spare time while the others are crunching away. (side note: I want one of those!) – JVMATL Jan 14 '14 at 21:40
  • 1
    @JVMATL - As envious as I am, I'm also curious to know how a 32-core machine can be showing this behaviour. The idea is to try and narrow it down a bit - sort of like a debug step. If it works, give each app 8 cores and have some spares left for other things :) – Ewald Jan 14 '14 at 21:43
  • If the CPU usage is high they are not I/O bound. Examine your assumptions. – user207421 Jan 14 '14 at 21:48
  • @EJP He never said they were I/O bound, only that they were I/O intensive. – Chris Hayes Jan 14 '14 at 21:50
  • Of course, I can easily construct an I/O bound situation with high CPU usage just by starting with lots of I/O and adding on more and more CPU activity between I/Os. – Ron Burk Jan 14 '14 at 21:52
  • So, we still don't know if your Java app stops completely or just runs very, very slowly. It's also not precisely clear how your Java app waits for data--are you spinning in a loop calling File.exists()? I see the claim that this can produce "400MB of garbage per second": http://stackoverflow.com/questions/6321180/how-expensive-is-file-exists-in-java – Ron Burk Jan 14 '14 at 22:29
  • @RonBurk It runs very very slowly. I actually tried a different app that just reads a 400MB file and it is taking forever – Keeto Jan 14 '14 at 22:39
  • It seems my linux disk scheduler is set to "deadline". I am thinking of changing the scheduler to "cfq" which is supposedly fair – Keeto Jan 14 '14 at 22:46
  • OK My Linux IO scheduler was the problem. Thank you for all your help guys. – Keeto Jan 14 '14 at 22:59
  • 1
    Came this close to asking you to cat the scheduler, thought "Naw, what are the odds?" :-) – Ron Burk Jan 15 '14 at 00:45
  • @ChrisHayes 'Constantly writing a large amount of data to the disk' sounds I/O bound to me. – user207421 Jan 16 '14 at 10:16

1 Answers1

1

OK I will answer my question in case someone comes across the same issue. My problem is I had an unfair disk scheduler. For Linux, there are many types of IO schedulers (deadline, cfq, noop). CFQ is a fair scheduler and it is the default in many linux distributions...except mine I guess! I changed the scheduler to CFQ and now it is working fine. You can check your current scheduler using the command

 cat /sys/block/{device name}/queue/scheduler

You can set your IO scheduler using the command

 echo cfq > /sys/block/{device name}/queue/scheduler
Keeto
  • 4,074
  • 9
  • 35
  • 58