5

As I was writing a Python script using a third party module, the workload was so big that the OS (Linux with 32GB memory) killed it everytime before it could complete. We learned from syslog that it ran out of physical memory, so the OS killed it through OOM.

Many current performance analysis tools e.g. profile require completion of the script and can not go into the modules that the script used. So I reckon that this should be a common case where completion of the script is not available, and performance analysis is needed desperately under this kind of circumstance. Any advice?

Makoto
  • 104,088
  • 27
  • 192
  • 230
Charley
  • 69
  • 1
  • 3
  • Maybe it's undecidable! [Rice's Theorem](https://en.wikipedia.org/wiki/Rice's_theorem) (I realize this comment does not help your situation at all but it could be interesting from a theoretical standpoint) – sakurasunris3 Dec 10 '16 at 08:29
  • 1
    @MarshallWhite this is why static analysis tools have false positive and false negative. But they do exist. – UmNyobe Dec 10 '16 at 08:31
  • did you search the word 'memory' on pypi ? – Gribouillis Dec 10 '16 at 08:32
  • @Gribouillis I realized this is a very interesting module for memory analysis, thanks for help :) It partly solved my problem, however it may not tell where the program get stuck. sth that can tell like a function spend more time than expected or endless loop.I am looking into it. Thank you so much. – Charley Dec 10 '16 at 08:48
  • @MarshallWhite I didn't realize there is a theory for that :-P But it looks interesting, food for thought. – Charley Dec 10 '16 at 08:51
  • @Charley this is probably not going to help much but if you have a way to run something like unit tests.. You should be able to split the program in smaller programs that actually finish their tasks. This way, you can understand more the behaviour of the program and where are the bottlenecks. What you can do is measure time of some function call using something like timeit and dig until you find where it stuck. As for memory usage, think about each unit that complete aren't causing the huge amount of memory used. Also PDB could be handy – Loïc Faure-Lacroix Dec 10 '16 at 12:22
  • @LoïcFaure-Lacroix You are absolutely right. As much of my work is build on that 3rd party module, I should figure out how it was constructed, it's very necessary, thanks. But you know sometimes the module gets so complicated that you don't know where to start. Maybe I just need to be more patient :-P and learn some software engineering. – Charley Dec 11 '16 at 02:10
  • @LoïcFaure-Lacroix I found an interesting module named pycallgraph this morning.It supports aborting from middle of a run. And by giving Gephi option, it works when I kill the process by ctr + c.A pity that the functions in its result are very low-level. Maybe we can do this on our own :D – Charley Dec 11 '16 at 06:56
  • Can't you insert `print` statements in various places to see at what point it exits? That should give you a good idea about what is sucking your memory dry. P.S. What do you plan to do, once you've identified the root of the problem? – tommy.carstensen Dec 13 '16 at 19:15

1 Answers1

0

From the original question:


Profile is an amazing tool for performance analysis and does not require completion, and can go into the module that the script used. I think for this question, the best answer is to use profile.

Makoto
  • 104,088
  • 27
  • 192
  • 230