improving stack trace hook in python

Question

All,

I have a question similar to question 2617120, found here:

how to use traceit to report function input variables

where the questioner wanted pointers on how to make python printout function parameters when they were executed via a tracing hook.

I'm looking for something very similar to this but with a twist. Instead of all data being dumped out, I want to eval the code when its running, and to print out any evaled variables. For example, with the following code:

for modname in modnames:                   

if not modname or '.' in modname:      
     continue                                                                    
...

the trace hook would cause the following to be printed out:

for modname in modnames:                | for init in init,., encoding
                                        |
if not modname or '.' in modname:       | if not init or '.' in init
     continue                           |     continue
if not modname or '.' in modname:       | if not . or '.' in .
...                                     |

where the line of code undergoes interpolation based off of the running frame. I've done this in perl where it's a lifesaver in certain circumstances.

Anybody have ideas on the best way of going about doing this in python? I have my ideas, but I'd like to hear what people think (and if they have any already pre-made solutions)

Here, btw is the reference code:

import sys
import linecache
import random

def traceit(frame, event, arg):
    if event == "line":
        lineno = frame.f_lineno
        filename = frame.f_globals["__file__"]
        if filename == "<stdin>":
            filename = "traceit.py"
        if (filename.endswith(".pyc") or
            filename.endswith(".pyo")):
            filename = filename[:-1]
        name = frame.f_globals["__name__"]
        line = linecache.getline(filename, lineno)
        print "%s:%s:%s: %s" % (name,  lineno,frame.f_code.co_name,line.rstrip())
    return traceit


def main():
    print "In main"
    for i in range(5):
        print i, random.randrange(0, 10)
    print "Done."

sys.settrace(traceit)
main()

Please [don't use signatures or taglines](http://stackoverflow.com/faq#signatures) in your posts. — user229044, Dec 08 '10 at 05:10
One problem I see is that a line like `for i in range(5)` the variable `i` will have the value from the last iteration (no value at all the first time it's traced) because the trace function is called *before* the line is executed. You'd have to retain the previous line and print that. You could probably handle the final line when you get a `return` event. Very interested to see how this comes out. — kindall, Dec 08 '10 at 05:27
kindall - yes, there are edge cases, and I'm not sure that my (perl) code handles all of them even as it stands after I've used it extensively, but even an imperfect module like this can save hours, even days. — user534463, Dec 08 '10 at 05:49
All you do - when you get a bug - is run the trace mode and save it to a file. Searching for the origin of the bug is often a simple matter of opening up a decent editor and searching for the string, trying it to a attribute, get the associated class. Often the bug jumps out at you it's so obvious. Its also a great way of getting used to codebases you have no idea about, much more efficient than a debugger. — user534463, Dec 08 '10 at 05:52
Also `eval` and `print` can easily have side-effects. You might be able to hack something together by subclassing `pdb` though. — Katriel, Dec 08 '10 at 06:09

kindall · Answer 1 · 2010-12-08T23:47:55.250

0

Here's a quick hack that might give you something of a start, given a line of text in line and the current stack frame in frame (this is meant to be an inner function of traceit by the way).

import re
from types import *

def interpolatevar(matchobj):
    excludetypes = set((NoneType, TypeType, FunctionType, LambdaType, ClassType,
                    CodeType, InstanceType, MethodType, BuiltinFunctionType,
                    BuiltinMethodType))

    var = matchobj.group(0)
    basevar = var.split(".")[0]
    if basevar in frame.f_code.co_names or basevar in frame.f_code.co_varnames:
        if basevar in frame.f_globals or basevar in frame.f_locals:
            val = eval(var, frame.f_globals, frame.f_locals)
            if type(val) not in excludetypes:
                return repr(val)
    return var

line = re.sub(r"[A-Za-z_][A-Za-z0-9_]*(\.[A-Za-z_][A-Za-z0-9_]*)*", 
              interpolatevar, line)

Since this uses a regex to find identifiers, it stupidly finds them even if they're in string literals, but the identifier has to be actually used in the function and defined in either local or global scope.

It does handle object attribute access (e.g. foo.bar will be substituted with that value), which the version I originally posted would not, and it filters out values of various types since substituting foo with <function foo at 0x02793470> doesn't really tell you much you don't already know. (The exclusion list is easily customizable, of course, if you are interested in some values of those types, or you can add others from the types module.)

Long term, I think it might be profitable to look at the bytecode for the line, as it is easier to identify which tokens are actually identifiers that way. The ast module might also be useful to produce a parse tree of the statement, which would let you figure out what the identifiers are, but this is problematic for conditionals and loops when you're only seeing a line at a time.

edited Dec 08 '10 at 23:47

answered Dec 08 '10 at 15:00

kindall

178,883
35
278
309

I did play around with the `ast` module and got it to a place that could pretty reliably find the identifiers and attributes used in a given line of code. It fails when a line is continued, though, and this 1) is very common in the Python standard library and 2) would take more work to solve than I really want to put in right now. – kindall Dec 09 '10 at 03:29
kindall - if you post the code, I can perhaps pick it up and run with it and post the completed trace.. BTW I'm not sure if I made myself completely clear - I'm not looking for a complete eval of the resulting string, I'm looking for var interpolation, hence 'a = 12, b = a*2, c = b * 3' becomes '= 12, = 12 * 2, = 24 * 3' since eval has nasty side effects. In fact, with the perl version, I spent a lot of time coding *around* side effects so I couldn't accidentally run a code inside an interpolated string. anyways, yes, please post - I'll take it from where you left off and post results back. – user534463 Dec 09 '10 at 06:32
The only reason I'm using `eval` in what I posted is so I can easily get object attributes (e.g. `self.foo`) -- only identifiers (with or without dots in them) are ever actually evaluated, and these are done individually. The entire line is never evaluated, so side effects will be minimized. Of course, `self.foo` could be a property, and so could have side effects, but that is going to be a problem with or without `eval`. – kindall Dec 09 '10 at 06:56
ok, cool, that makes sense, although I'm still sort of wondering if eval is the right mechanism here. I've been looking at various templating schemes (http://wiki.python.org/moin/Templating) and was wondering which of these were lightweight enough to support arbitrary evaluation of the expressions in the line - example, for default behavior arrays could automatically expand, hashes likewise, and objects could stringify themselves. I'd also be interested on how to distinguish between properties and attributes - all it takes is one side-effect in a complicated project to totally ruin your day.. – user534463 Dec 09 '10 at 20:38
It is very hard to detect which attributes of an instance are properties. Basically, on an instance, you can't; you have to look at the instance's base classes and try to identify it there. The default behavior you want to print objects is basically already in what I posted (it's the `repr()` call). You could add more in-depth stringification the `repr()` result is wrapped in `< >` brackets. – kindall Dec 09 '10 at 21:15
Looks like the `tokenize` module would be useful here, perhaps more than the regex. http://docs.python.org/library/tokenize.html – kindall Dec 10 '10 at 17:22

improving stack trace hook in python

1 Answers1