0

All of the sudden an application that I've been working on is hammering the server's cpu and memory. I have not had any code changes since this "pegging" has started. I did some digging into strace to try and find out what is going on but I'm needing help deciphering what the output really means. I took one pid that was running for several minutes with cpu usage at 100% and using about 1.5GB of memory and ran strace -c with it. I got the following output. I'm taking a long time in the clone and wait4 commands. Can anybody give me a direction to move in with this info?

    % time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.10    1.011982      252996         4           clone
 30.67    0.573741          51     11296           brk
 11.98    0.224099       56025         4           wait4
  1.96    0.036701          53       687           munmap
  0.68    0.012642        1580         8           mremap
  0.58    0.010928           2      4886         5 read
  0.01    0.000135           0      2854           fstat
  0.01    0.000112           0       845        25 open
  0.00    0.000073           0       935       115 lstat
  0.00    0.000044           0        97        23 access
  0.00    0.000043           0       464        40 stat
  0.00    0.000037           0       466           write
  0.00    0.000037           0       466           gettimeofday
  0.00    0.000035           0       840           close
  0.00    0.000000           0       173           poll
  0.00    0.000000           0       210           lseek
  0.00    0.000000           0       688           mmap
  0.00    0.000000           0         5           rt_sigaction
  0.00    0.000000           0         5           rt_sigprocmask
  0.00    0.000000           0        16           writev
  0.00    0.000000           0        55           setitimer
  0.00    0.000000           0         2           socket
  0.00    0.000000           0         2         2 connect
  0.00    0.000000           0         4           accept
  0.00    0.000000           0         6           shutdown
  0.00    0.000000           0         4           getsockname
  0.00    0.000000           0        10           setsockopt
  0.00    0.000000           0         2           getsockopt
  0.00    0.000000           0         8           semop
  0.00    0.000000           0        38           fcntl
  0.00    0.000000           0       168           flock
  0.00    0.000000           0        12           getdents
  0.00    0.000000           0        25           getcwd
  0.00    0.000000           0        10           chdir
  0.00    0.000000           0         2           unlink
  0.00    0.000000           0         2           chmod
  0.00    0.000000           0        15           umask
  0.00    0.000000           0         7           times
  0.00    0.000000           0         2         1 futex
  0.00    0.000000           0         4           epoll_wait
  0.00    0.000000           0         6           openat
  0.00    0.000000           0         4           pipe2
------ ----------- ----------- --------- --------- ----------------
100.00    1.870609                 25337       211 total

NEW ---ADDITIONAL INFO

I have some more info on this. User's of my application upload images. The directory where the images get uploaded to now has 80k plus image files that have been uploaded. When I do a strace of a process that is hogging cpu and memory I get a lot of reads in that directory followed by brk. Below is an exceprt from the strace.

poll([{fd=18, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout)
write(18, "e\2\0\0\3SELECT \"File\".\"ClassName\", "..., 617) = 617
read(18, "\1\0\0\1\0244\0\0\2\3def\4live\4File\4File\tCla"..., 16384) = 11488
read(18, "\0010\0010\00266\10MMSImage\343\0\0D\10MMSImage\02320"..., 16384) = 13032
read(18, "\373\0011\0010\0010\003122\10MMSImage\345\0\0|\10MMSImag"..., 16384) = 11584
read(18, "257\373\0011\0010\0010\003179\10MMSImage\345\0\0\255\10MMSI"..., 16384) = 16384
read(18, "nswer_Image/0_9373043939_1321407"..., 16384) = 16384
read(18, "or question #1054.Eassets/UserCo"..., 16384) = 16384

then a bit later a ton of brk's

brk(0x7f2ba285e000)                     = 0x7f2ba285e000
brk(0x7f2ba291e000)                     = 0x7f2ba291e000
brk(0x7f2ba295e000)                     = 0x7f2ba295e000
brk(0x7f2ba299e000)                     = 0x7f2ba299e000
brk(0x7f2ba29de000)                     = 0x7f2ba29de000
brk(0x7f2ba2a1f000)                     = 0x7f2ba2a1f000
brk(0x7f2ba2a5f000)                     = 0x7f2ba2a5f000
brk(0x7f2ba2a9f000)                     = 0x7f2ba2a9f000
brk(0x7f2ba2adf000)                     = 0x7f2ba2adf000
brk(0x7f2ba2b1f000)                     = 0x7f2ba2b1f000
brk(0x7f2ba2b5f000)                     = 0x7f2ba2b5f000
brk(0x7f2ba2b9f000)                     = 0x7f2ba2b9f000
brk(0x7f2ba2bdf000)                     = 0x7f2ba2bdf000
brk(0x7f2ba2c1f000)                     = 0x7f2ba2c1f000
brk(0x7f2ba2c5f000)                     = 0x7f2ba2c5f000
brk(0x7f2ba2c9f000)

Anybody know what may be going on there. Do I have too many files in the directory or something like that? Does the harddrive have to seek to the directory too much?

Josh
  • 301
  • 5
  • 15
  • I have some more info on this. User's of my application upload images. The directory where the images get uploaded to now has 80k plus image files that have been uploaded. When I do a strace of a process that is hogging cpu and memory I get a lot of reads in that directory followed by brk. Below is an exceprt from the strace. – Josh Dec 18 '12 at 03:12

1 Answers1

0

Well, let's start at the top. A lot of clone and wait4 means that this application does much more forking or threading then it should be doing.

As for the directory size, if you are reading from the image files into dynamically allocated memory, brk must be called to get more memory. That simple.

Just take out a few of the forks or threading if you ask me.

Linuxios
  • 34,849
  • 13
  • 91
  • 116
  • Ok, I got the clone and form stuff figured out. As for the lower portion of strace I typed out above (write then reads) . It looks like that write is an sql select and the read is the output of it. In the full output of the strace you can see hundreds of lines of reads and then hundreds of lines of brk(0xff2cc..) . Also, the read lines look like they are expecting to read more than they get a lot of the time. For example. read(18, "\1\0\0\1\0244\0\0\2\3def\4live\4File\4File\tCla"..., 16384) = 11488 It looks like were looking for 16384 bytes and seeing 11488. 16k*500 reads .8mb – Josh Dec 19 '12 at 04:45