4

Given a Python script with print() statements, I'd like to be able to run through the script and insert a comment after each statement that shows the output from each. To demonstrate, take this script named example.py:

a, b = 1, 2

print('a + b:', a + b)

c, d = 3, 4

print('c + d:', c + d)

The desired output would be:

a, b = 1, 2

print('a + b:', a + b)
# a + b: 3

c, d = 3, 4

print('c + d:', c + d)
# c + d: 7

Here's my attempt, which works for simple examples like the one above:

import sys
from io import StringIO

def intercept_stdout(func):
    "redirect stdout from a target function"
    def wrapper(*args, **kwargs):
        "wrapper function for intercepting stdout"
        # save original stdout
        original_stdout = sys.stdout

        # set up StringIO object to temporarily capture stdout
        capture_stdout = StringIO()
        sys.stdout = capture_stdout

        # execute wrapped function
        func(*args, **kwargs)

        # assign captured stdout to value
        func_output = capture_stdout.getvalue()

        # reset stdout
        sys.stdout = original_stdout

        # return captured value
        return func_output

    return wrapper


@intercept_stdout
def exec_target(name):
    "execute a target script"
    with open(name, 'r') as f:    
        exec(f.read())


def read_target(name):
    "read source code from a target script & return it as a list of lines"
    with open(name) as f:
        source = f.readlines()

    # to properly format last comment, ensure source ends in a newline
    if len(source[-1]) >= 1 and source[-1][-1] != '\n':
        source[-1] += '\n'

    return source


def annotate_source(target):
    "given a target script, return the source with comments under each print()"
    target_source = read_target(target)

    # find each line that starts with 'print(' & get indices in reverse order
    print_line_indices = [i for i, j in enumerate(target_source)
                              if len(j) > 6 and j[:6] == 'print(']
    print_line_indices.reverse()

    # execute the target script and get each line output in reverse order
    target_output = exec_target(target)
    printed_lines = target_output.split('\n')
    printed_lines.reverse()

    # iterate over the source and insert commented target output line-by-line
    annotated_source = []
    for i, line in enumerate(target_source):
        annotated_source.append(line)
        if print_line_indices and i == print_line_indices[-1]:
            annotated_source.append('# ' + printed_lines.pop() + '\n')
            print_line_indices.pop()

    # return new annotated source as a string
    return ''.join(annotated_source)


if __name__ == '__main__':
    target_script = 'example.py'
    with open('annotated_example.py', 'w') as f:
        f.write(annotate_source(target_script))

However, it fails for scripts with print() statements that span multiple lines, as well as for print() statements that aren't at the start of a line. In a best-case scenario, it would even work for print() statements inside a function. Take the following example:

print('''print to multiple lines, first line
second line
third line''')

print('print from partial line, first part') if True else 0

1 if False else print('print from partial line, second part')

print('print from compound statement, first part'); pass

pass; print('print from compound statement, second part')

def foo():
    print('bar')

foo()

Ideally, the output would look like this:

print('''print to multiple lines, first line
second line
third line''')
# print to multiple lines, first line
# second line
# third line

print('print from partial line, first part') if True else 0
# print from partial line, first part

1 if False else print('print from partial line, second part')
# print from partial line, second part

print('print from compound statement, first part'); pass
# print from compound statement, first part

pass; print('print from compound statement, second part')
# print from compound statement, second part

def foo():
    print('bar')

foo()
# bar

But the script above mangles it like so:

print('''print to multiple lines, first line
# print to multiple lines, first line
second line
third line''')

print('print from partial line, first part') if True else 0
# second line

1 if False else print('print from partial line, second part')

print('print from compound statement, first part'); pass
# third line

pass; print('print from compound statement, second part')

def foo():
    print('bar')

foo()

What approach would make this process more robust?

Alec
  • 1,399
  • 4
  • 15
  • 27
  • 3
    What would you expect it to do in a situation like `def foo(a,b): print(a,b)` where `foo` can be called many times? – Brian Jul 06 '16 at 18:26
  • 1
    How are you trying to display prints where you don't know ahead of time the value? ex `print(randint(0,100))`? – xgord Jul 06 '16 at 18:52
  • @xgord these would still be displayed, but would be different for each run-through. I'm meaning to use it mostly in cases where the outcome is the same every time, but they could still be useful to showcase example output. – Alec Jul 06 '16 at 18:58
  • @Brian That's a great point, I've edited the question for how I'd expect to see it implemented. – Alec Jul 06 '16 at 18:58

5 Answers5

7

Have you considered using the inspect module? If you are willing to say that you always want the annotations next to the top most call, and the file you are annotating is simple enough, you can get reasonable results. The following is my attempt, which overrides the built in print function and looks at a stack trace to determine where print was called:

import inspect
import sys
from io import StringIO

file_changes = {}

def anno_print(old_print, *args, **kwargs):
    (frame, filename, line_number,
     function_name, lines, index) = inspect.getouterframes(inspect.currentframe())[-2]
    if filename not in file_changes:
        file_changes[filename] = {}
    if line_number not in file_changes[filename]:
        file_changes[filename][line_number] = []
    orig_stdout = sys.stdout
    capture_stdout = StringIO()
    sys.stdout = capture_stdout
    old_print(*args, **kwargs)
    output = capture_stdout.getvalue()
    file_changes[filename][line_number].append(output)
    sys.stdout = orig_stdout
    return

def make_annotated_file(old_source, new_source):
    changes = file_changes[old_source]
    old_source_F = open(old_source)
    new_source_F = open(new_source, 'w')
    content = old_source_F.readlines()
    for i in range(len(content)):
        line_num = i + 1
        new_source_F.write(content[i])
        if content[i][-1] != '\n':
            new_source_F.write('\n')
        if line_num in changes:
            for output in changes[line_num]:
                output = output[:-1].replace('\n', '\n#') + '\n'
                new_source_F.write("#" + output)
    new_source_F.close()



if __name__=='__main__':
    target_source = "foo.py"
    old_print = __builtins__.print
    __builtins__.print = lambda *args, **kwargs: anno_print(old_print, *args, **kwargs)
    with open(target_source) as f:
        code = compile(f.read(), target_source, 'exec')
        exec(code)
    __builtins__.print = old_print
    make_annotated_file(target_source, "foo_annotated.py")

If I run it on the following file "foo.py":

def foo():
    print("a")
    print("b")

def cool():
    foo()
    print("c")

def doesnt_print():
    a = 2 + 3

print(1+2)
foo()
doesnt_print()
cool()

The output is "foo_annotated.py":

def foo():
    print("a")
    print("b")

def cool():
    foo()
    print("c")

def doesnt_print():
    a = 2 + 3

print(1+2)
#3
foo()
#a
#b
doesnt_print()
cool()
#a
#b
#c
  • That's awesome! `inspect.getouterframes()` looks like a great approach. I also like your decision to override `print()` directly rather than going after `stdout` alone as I did. Only real edge case I've found so far is when a string inside `print()` spans multiple lines as in the second example in the original question. – Alec Jul 21 '16 at 15:10
  • 1
    Oh yeah, this is an issue of formatting when writing to the annotated file. I editted the original response with this line: `output = output[:-1].replace('\n', '\n#') + '\n'` I think it will work on the multiline print now. – Matthew G Dippel Jul 21 '16 at 15:20
  • That fixes it, thanks! One more small thing: When I run your example now (as well as when I run the others) I get the first comment of the last function printing on the same line (like `cool()#a`). Any idea what's happening? – Alec Jul 21 '16 at 15:30
  • 1
    This was due to the file not ending in a new line, so right after printing cool(), it started printing the comments on the same line. I added the following to the answer: `if content[i][-1] != '\n': new_source_F.write('\n')` – Matthew G Dippel Jul 21 '16 at 15:37
  • Beautiful! This covers every use-case in the examples. One use-case where it still fails is if you add a `print()` function at the end of the script (something like `print('annotation of', target_source ,'complete!')` and then try to run it on a copy of *itself* (it fails with a `RecursionError`). Not something you're likely to run into but could still be interesting to find a workaround! – Alec Jul 21 '16 at 15:46
  • 1
    Ah interesting case... this essentially happens because it is expecting `old_print` to be the original print function, but it ends up referring to the annotated print. So when it called old_print, it is just calling itself again. I'm not sure how I would fix that. – Matthew G Dippel Jul 21 '16 at 15:56
  • Maybe by attaching a flag or overriding an unused method on `print()` when it's overridden the first time, and checking for the existence of the flag before resetting it? – Alec Jul 21 '16 at 15:59
1

Thanks to feedback from @Lennart, I've almost got it working... It iterates through line-by-line, clumping lines into longer and longer blocks as long as the current block contains a SyntaxError when fed to exec(). Here it is in case it's of use to anyone else:

import sys
from io import StringIO

def intercept_stdout(func):
    "redirect stdout from a target function"
    def wrapper(*args, **kwargs):
        "wrapper function for intercepting stdout"
        # save original stdout
        original_stdout = sys.stdout

        # set up StringIO object to temporarily capture stdout
        capture_stdout = StringIO()
        sys.stdout = capture_stdout

        # execute wrapped function
        func(*args, **kwargs)

        # assign captured stdout to value
        func_output = capture_stdout.getvalue()

        # reset stdout
        sys.stdout = original_stdout

        # return captured value
        return func_output

    return wrapper

@intercept_stdout
def exec_line(source, block_globals):
    "execute a target block of source code and get output" 
    exec(source, block_globals)

def read_target(name):
    "read source code from a target script & return it as a list of lines"
    with open(name) as f:
        source = f.readlines()

    # to properly format last comment, ensure source ends in a newline
    if len(source[-1]) >= 1 and source[-1][-1] != '\n':
        source[-1] += '\n'

    return source

def get_blocks(target, block_globals):
    "get outputs for each block of code in source"
    outputs = []
    lines = 1

    @intercept_stdout
    def eval_blocks(start_index, end_index, full_source, block_globals):
        "work through a group of lines of source code and exec each block"
        nonlocal lines
        try:    
            exec(''.join(full_source[start_index:end_index]), block_globals)
        except SyntaxError:
            lines += 1
            eval_blocks(start_index, start_index + lines,
                        full_source, block_globals)

    for i, s in enumerate(target):
        if lines > 1:
            lines -= 1
            continue  
        outputs.append((eval_blocks(i, i+1, target, block_globals), i, lines))

    return [(i[1], i[1] + i[2]) for i in outputs]

def annotate_source(target, block_globals={}):
    "given a target script, return the source with comments under each print()"
    target_source = read_target(target)

    # get each block's start and end indices
    outputs = get_blocks(target_source, block_globals)
    code_blocks = [''.join(target_source[i[0]:i[1]]) for i in outputs]

    # iterate through each
    annotated_source = []
    for c in code_blocks:
        annotated_source.append(c)
        printed_lines = exec_line(c, block_globals).split('\n')
        if printed_lines and printed_lines[-1] == '':
            printed_lines.pop()
        for line in printed_lines:
            annotated_source.append('# ' + line + '\n')

    # return new annotated source as a string
    return ''.join(annotated_source)

def main():
    ### script to format goes here
    target_script = 'example.py'

    ### name of formatted script goes here
    new_script = 'annotated_example.py'

    new_code = annotate_source(target_script)
    with open(new_script, 'w') as f:
        f.write(new_code)

if __name__ == '__main__':
    main()

It works for each of the two examples above. However, when trying to execute the following:

def foo():
    print('bar')
    print('baz')

foo()

Instead of giving me the desired output:

def foo():
    print('bar')
    print('baz')

foo()
# bar
# baz

It fails with a very long traceback:

Traceback (most recent call last):
  File "ex.py", line 55, in eval_blocks
    exec(''.join(full_source[start_index:end_index]), block_globals)
  File "<string>", line 1
    print('baz')
    ^
IndentationError: unexpected indent

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ex.py", line 55, in eval_blocks
    exec(''.join(full_source[start_index:end_index]), block_globals)
  File "<string>", line 1
    print('baz')
    ^
IndentationError: unexpected indent

During handling of the above exception, another exception occurred:

...

Traceback (most recent call last):
  File "ex.py", line 55, in eval_blocks
    exec(''.join(full_source[start_index:end_index]), block_globals)
  File "<string>", line 1
    print('baz')
    ^
IndentationError: unexpected indent

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ex.py", line 102, in <module>
    main()
  File "ex.py", line 97, in main
    new_code = annotate_source(target_script)
  File "ex.py", line 74, in annotate_source
    outputs = get_blocks(target_source, block_globals)
  File "ex.py", line 65, in get_blocks
    outputs.append((eval_blocks(i, i+1, target, block_globals), i, lines))
  File "ex.py", line 16, in wrapper
    func(*args, **kwargs)
  File "ex.py", line 59, in eval_blocks
    full_source, block_globals)
  File "ex.py", line 16, in wrapper
    func(*args, **kwargs)   

...

  File "ex.py", line 16, in wrapper
    func(*args, **kwargs)
  File "ex.py", line 55, in eval_blocks
    exec(''.join(full_source[start_index:end_index]), block_globals)
RecursionError: maximum recursion depth exceeded while calling a Python object

Looks like this happens due to def foo(): print('bar') being valid code and so print('baz') isn't being included in the function, causing it to fail with an IndentationError. Any ideas as to how to avoid this issue? I suspect it may require diving into ast as suggested above but would love further input or a usage example.

Alec
  • 1,399
  • 4
  • 15
  • 27
1

You can make it a lot easier by using an existing python parser to extract top level statements from your code. The ast module in the standard library for example. However, ast loses some information like comments.

Libraries built with source code transformations (which you are doing) in mind might be more suited here. redbaron is a nice example.

To carry globals to the next exec(), you have to use the second parameter (documentation):

environment = {}
for statement in statements:
    exec(statement, environment)
Lennart
  • 504
  • 3
  • 6
  • Great suggestions with ast and redbaron (I've only utilized `ast.literal_eval()` before, will have to digest some of the more advanced functionality). Is there a way to *extract* the environment from `exec()` so that I can chain them together? – Alec Jul 19 '16 at 01:43
  • 1
    Sure, exec modifies the dictionary you pass it. So when you give an empty dictionary to exec, it will.contain the environment afterwards – Lennart Jul 19 '16 at 01:53
  • That's a relief! (I was concerned that I'd have to intercept `exec()`'s default `None` return value.) – Alec Jul 19 '16 at 01:55
  • Haha fortunately not. Good luck implementing! – Lennart Jul 19 '16 at 02:03
  • The `ast` solution will fail in cases where literals are joined inside `print` calls. [The joining of literals is done while the `ast` is created, the original values are only present during the parsing phase.](http://stackoverflow.com/questions/34174539/python-string-literal-concatenation/34174612#34174612) – Dimitris Fasarakis Hilliard Jul 19 '16 at 02:31
1

It looks like except SyntaxError isn't a sufficient check for a full function, as it will finish the block a the first line which doesn't create a syntax error. What you want is to make sure the whole function is encompassed in the same block. To accomplish this:

  • check if the current block is a function. Check if the first line starts with def.

  • check if the next line in full_source begins with a higher or equal number of spaces as the second line of the function (the one which defines the indent). This will mean that the eval_blocks will check if the next line of the code has higher or equal spacing, and is therefore inside the function.

The code for get_blocks might look something like this:

# function for finding num of spaces at beginning (could be in global spectrum)
def get_front_whitespace(string):
    spaces = 0
    for char in string:
        # end loop at end of spaces
        if char not in ('\t', ' '): 
            break
        # a tab is equal to 8 spaces
        elif char == '\t':
            spaces += 8
        # otherwise must be a space
        else:
            spaces += 1
    return spaces

...

def get_blocks(target, block_globals):
    "get outputs for each block of code in source"
    outputs = []
    lines = 1
    # variable to check if current block is a function
    block_is_func = False

    @intercept_stdout
    def eval_blocks(start_index, end_index, full_source, block_globals):
        "work through a group of lines of source code and exec each block"
        nonlocal lines
        nonlocal block_is_func
        # check if block is a function
        block_is_func = ( full_source[start_index][:3] == 'def' )
        try:    
            exec(''.join(full_source[start_index:end_index]), block_globals)
        except SyntaxError:
            lines += 1
            eval_blocks(start_index, start_index + lines,
                        full_source, block_globals)
        else:
            # if the block is a function, check for indents
            if block_is_func:
                # get number of spaces in first indent of function
                func_indent= get_front_whitespace( full_source[start_index + 1] )
                # get number of spaces in the next index 
                next_index_spaces = get_front_whitespace( full_source[end_index + 1] )
                # if the next line is equally or more indented than the function indent, continue to next recursion layer
                if func_indent >= next_index_spaces:
                    lines += 1
                    eval_blocks(start_index, start_index + lines,
                               full_source, block_globals)

    for i, s in enumerate(target):
        # reset the function variable for next block
        if block_is_func: block_is_func = False
        if lines > 1:
            lines -= 1
            continue  
        outputs.append((eval_blocks(i, i+1, target, block_globals), i, lines))

    return [(i[1], i[1] + i[2]) for i in outputs]

This might create an index error if the last line of the function was the end of the file though, due to the forward indexing at end_index_spaces = get_front_whitespace( full_source[end_index + 1] )

This could also be used for selection statements and loops, which may have the same problem: just check for if for and while at the beginning of the start_index line as well as for def. This would cause the comment to be after the indented region, but as printed output inside indented regions are dependent on the variables which are used to call them, I think having the output outside the indent would be necessary in any case.

Almonso
  • 51
  • 4
0

Try https://github.com/eevleevs/hashequal/

I made this as an attempt to replace Mathcad. Does not act on print statements, but on #= comments, e.g.:

a = 1 + 1 #=

becomes

a = 1 + 1 #= 2

Giulio
  • 469
  • 5
  • 15
  • The author of [hashequal](https://github.com/eevleevs/hashequal/) is named Giulio, like you. So, I assume you are the author. While it is OK to self-promote own work, you should clearly annotate that it is your work. Have a look at https://meta.stackexchange.com/q/182212/395109 – Adrian W Nov 12 '18 at 08:12