4

Most of my Python program is spent in a method called _build_userdbs. I'm using the awesome tool SnakeViz which helps interpreting the results. There's a screenshot below.

Visualization from SnakeViz

So right now, in that picture, I'm in _build_userdbs. The big green circle right outside of that is a method called _append_record and, as you can see, it takes up almost all of _build_userdbs. I understand this.

But here's the confusing part. The green circle outside of the inner green circle (which takes up the vast majority of the time) is the cumulative time of the _append_record minus the cumulative time of the functions called in _append_record.

Quantitatively, _append_record's cumulative time is 55970 seconds. That's the inner green circle. Its total time is 54210 seconds. That's the outer green circle.

_append_record, as you can see if you open that image in a new tab, calls a couple other functions. Those are:

  • json.dumps() (459 seconds cumulative)
  • _userdb_scratch_file_path() (161 seconds cumulative)
  • open (2160 seconds cumulative)
  • print (less than .1% of frame so not displayed)

Alright, this makes sense; because of the relatively small difference between _append_record's cumulative time and total time, it must be doing a lot of processing in its own stack frame, rather than delegating it to other functions. But here's the body of the function:

def _append_record(self, user, record):
    record = json.dumps(record)
    dest_path = self._userdb_scratch_file_path(user)
    with open(dest_path, 'at') as dest:
        print(record, file=dest)

So where is all of this processing going on? Is this function call overhead that accounts for the difference? Profiling overhead? Are these results just inaccurate? Why isn't the close() function called?

Hut8
  • 6,080
  • 4
  • 42
  • 59

0 Answers0