2

So I'm not sure what I'm doing wrong. I have a profile generated by oprofile. I converted this oprofile using the command below to a text file (looooots of lines in that txt file):

 opreport -c --session-dir=dirNAME/oprofile_data/ > profile_test.txt

Here are a few sample lines from the profile_test.txt file:

-------------------------------------------------------------------------------
768       2.2662  libc-2.19.so             __mcount_internal
  768      100.000  libc-2.19.so             __mcount_internal [self]
-------------------------------------------------------------------------------
718       2.1187  libc-2.19.so             _int_free
  718      100.000  libc-2.19.so             _int_free [self]
-------------------------------------------------------------------------------
694       2.0479  libc-2.19.so             _int_malloc
  694      100.000  libc-2.19.so             _int_malloc [self]
-------------------------------------------------------------------------------
576       1.6997  libc-2.19.so             malloc
  576      100.000  libc-2.19.so             malloc [self]
-------------------------------------------------------------------------------
565       1.6672  libns3-dev-core-debug.so ns3::LogComponent::IsEnabled(ns3::LogLevel) const
  565      100.000  libns3-dev-core-debug.so ns3::LogComponent::IsEnabled(ns3::LogLevel) const [self]

Nothing interesting there.

Now I want to view it using gprof2dot. I run the script like this:

 ./gprof2dot.py -f oprofile --skew=0.001 --strip profile_test.txt | dot -Tsvg > profile_graph.svg

That produces a file, but it doesn't have everything from profile_test.txt (it's missing most function calls) and it's just a bar of calls rather than a tree:

The graph gprof2dot generated

How can I make it look nice with a tree structure? I followed these directions.

This is the actual resource allocation by the way:

CPU: Intel Haswell microarchitecture, speed 3.201e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CLK_UNHALT...|
  samples|      %|
------------------
    37395 100.000 ns3-dev-lena-profiling-debug
    CPU_CLK_UNHALT...|
      samples|      %|
    ------------------
        23577 63.0485 libns3-dev-lte-debug.so
         3888 10.3971 libc-2.19.so
         3185  8.5172 libns3-dev-core-debug.so
         1713  4.5808 no-vmlinux
         1317  3.5219 libstdc++.so.6.0.19
         1263  3.3775 libns3-dev-spectrum-debug.so
          633  1.6927 libns3-dev-network-debug.so
          581  1.5537 ld-2.19.so
          551  1.4735 ns3-dev-lena-profiling-debug
          285  0.7621 libm-2.19.so
           69  0.1845 libcuda.so.340.29
           61  0.1631 libns3-dev-internet-debug.so
           58  0.1551 libpthread-2.19.so
           53  0.1417 libgcc_s.so.1
           42  0.1123 libns3-dev-mpi-debug.so
           38  0.1016 libns3-dev-mobility-debug.so
           31  0.0829 libns3-dev-buildings-debug.so
           22  0.0588 libns3-dev-propagation-debug.so
           10  0.0267 libns3-dev-antenna-debug.so
            7  0.0187 libns3-dev-config-store-debug.so
            4  0.0107 [vdso] (tgid:24360 range:0x7fff60ffe000-0x7fff60ffffff)
            2  0.0053 libns3-dev-applications-debug.so
            2  0.0053 libns3-dev-stats-debug.so
            1  0.0027 libns3-dev-fd-net-device-debug.so
            1  0.0027 libns3-dev-point-to-point-debug.so
            1  0.0027 libcudart.so.6.5.14
Community
  • 1
  • 1
Mewa
  • 502
  • 1
  • 9
  • 24
  • You're getting self time only, no inclusive time. The whole reason *gprof* was invented 30+ years ago was to try to get inclusive time, because self time is almost useless, except in a tiny category of programs. Stack-sampling is the way to go. oprofile can do that. – Mike Dunlavey Oct 29 '14 at 12:14
  • @MikeDunlavey, thank you, but I have no idea what to do with your response. So I have to change the configuration for oprofile? I'm not on Ubuntu right now, so I'll go check that after I'm done debugging this and report back. I don't remember seeing this option, but then again... I usually miss lots of stuff. – Mewa Oct 29 '14 at 21:06
  • Sorry. You see, I think if the reason you are doing this is to try to find ways to make the code run faster (as opposed to just measuring for its own sake), there's a far more effective (and simpler) way than making pictures. [*This explains it.*](http://stackoverflow.com/a/25870103/23771) – Mike Dunlavey Oct 30 '14 at 14:49
  • @MikeDunlavey, thanks! That's a very informative post. I was mostly hoping to make diagrams for my supervisor's sake, plus I thought they might be neat to include in a thesis (again for illustration purposes). But you are right, I am doing it to find ways to run the code faster. – Mewa Oct 30 '14 at 14:56
  • 1
    Oh, boy. It's hard to find a professor who respects this technique. What I would do is 1) show the relevant samples, show the results, and show [*the scientific reason why it works*](http://scicomp.stackexchange.com/a/2719/1262), and 2) show slide 35 of [*this .ppt by Jon Bentley*](http://dimacs.rutgers.edu/Workshops/EAA/slides/bentley.pdf). Either that, or simply don't tell him/her how you got the speedup, because the last thing you need is to disagree with your supervisor. (I was a professor, and I know how they run in channels.) – Mike Dunlavey Oct 30 '14 at 15:15
  • Actually I really like the image in "the scientific reason why it works" link about speedup factor (at the very end). Unfortunately I can't avoid telling him about how I got the speedup since that is actually the main focus of my thesis (the acceleration of specific very slow simulations). – Mewa Oct 30 '14 at 15:30
  • PS. [Is this your book?](http://books.google.ca/books?hl=en&lr=&id=8A43E1UFs_YC&oi=fnd&pg=PR9&dq=mike+dunlavey&ots=RZwFoyoEt9&sig=DbL-seOoNbcLXktkxrJiLyHv0Co#v=onepage&q=mike%20dunlavey&f=false) I searched your name on Google Scholar :) – Mewa Oct 30 '14 at 15:31
  • 1
    Yes. It's long out of print. The price on Amazon is crazy. If you want to send me a note, I can send you a copy (about 20 mb). – Mike Dunlavey Oct 30 '14 at 15:49

0 Answers0