7

I want code that can analyze a function call like this:

whatever(foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowargs)

And return the positions of each and every argument, in this case foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowargs.

I tried using the _ast module, and it seems to be just the thing for the job, but unfortunately there were problems. For example, in an argument like baz() which is a function call itself, I couldn't find a simple way to get its length. (And even if I found one, I don't want a bunch of special cases for every different kind of argument.)

I also looked at the tokenize module but couldn't see how to use it to get the arguments.

Any idea how to solve this?

Ram Rachum
  • 84,019
  • 84
  • 236
  • 374
  • "And return the positions of each and every argument, in this case `foo`, `baz()`, `'puppet'`, `24+2`, `meow=3`, `*meowargs`, `**meowargs`." what do you want to return ? how do you figure out your call would be ? for what use ? it is quite unclear what you want to do – kiriloff May 19 '13 at 13:47
  • I'm not sure what you're trying to do either, but I'm pretty sure the correct, best, most robust way to do this is to look at the AST (preferably via the `ast` module, `_ast` is an implementation detail and `ast` adds some useful functionality). You need to get your head around the concept of ASTs and tree traversal, but without that you're bound to produce a slow, complex, limited, fragile solution anyway. –  May 19 '13 at 13:58
  • @antitrust The positions, i.e. the indices of their start and end in the string. The use is for an IDE script. I couldn't figure out your question about the call. – Ram Rachum May 19 '13 at 14:11
  • still not clear what you want. Do you want what was called(i.e. inside the function) or what can be called (e.g. IDE attempting to arange correct params). – Phil Cooper May 19 '13 at 14:25
  • See `foo`? I want a tuple where the first item is the position of `f` and the second item is the position of the final `o`. – Ram Rachum May 19 '13 at 20:10

3 Answers3

6

This code uses a combination of ast (to find the initial argument offsets) and regular expressions (to identify boundaries of the arguments):

import ast
import re

def collect_offsets(call_string):
    def _abs_offset(lineno, col_offset):
        current_lineno = 0
        total = 0
        for line in call_string.splitlines():
            current_lineno += 1
            if current_lineno == lineno:
                return col_offset + total
            total += len(line)
    # parse call_string with ast
    call = ast.parse(call_string).body[0].value
    # collect offsets provided by ast
    offsets = []
    for arg in call.args:
        a = arg
        while isinstance(a, ast.BinOp):
            a = a.left
        offsets.append(_abs_offset(a.lineno, a.col_offset))
    for kw in call.keywords:
        offsets.append(_abs_offset(kw.value.lineno, kw.value.col_offset))
    if call.starargs:
        offsets.append(_abs_offset(call.starargs.lineno, call.starargs.col_offset))
    if call.kwargs:
        offsets.append(_abs_offset(call.kwargs.lineno, call.kwargs.col_offset))
    offsets.append(len(call_string))
    return offsets

def argpos(call_string):
    def _find_start(prev_end, offset):
        s = call_string[prev_end:offset]
        m = re.search('(\(|,)(\s*)(.*?)$', s)
        return prev_end + m.regs[3][0]
    def _find_end(start, next_offset):
        s = call_string[start:next_offset]
        m = re.search('(\s*)$', s[:max(s.rfind(','), s.rfind(')'))])
        return start + m.start()

    offsets = collect_offsets(call_string)   

    result = []
    # previous end
    end = 0
    # given offsets = [9, 14, 21, ...],
    # zip(offsets, offsets[1:]) returns [(9, 14), (14, 21), ...]
    for offset, next_offset in zip(offsets, offsets[1:]):
        #print 'I:', offset, next_offset
        start = _find_start(end, offset)
        end = _find_end(start, next_offset)
        #print 'R:', start, end
        result.append((start, end))
    return result

if __name__ == '__main__':
    try:
        while True:
            call_string = raw_input()
            positions = argpos(call_string)
            for p in positions:
                print ' ' * p[0] + '^' + ((' ' * (p[1] - p[0] - 2) + '^') if p[1] - p[0] > 1 else '')
            print positions
    except EOFError, KeyboardInterrupt:
        pass

Output:

whatever(foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowargs)
         ^ ^
              ^   ^
                     ^      ^
                               ^  ^
                                     ^    ^
                                             ^       ^
                                                        ^        ^
[(9, 12), (14, 19), (21, 29), (31, 35), (37, 43), (45, 54), (56, 66)]
f(1, len(document_text) - 1 - position)
  ^
     ^                               ^
[(2, 3), (5, 38)]
utapyngo
  • 6,946
  • 3
  • 44
  • 65
  • Impressive hack. I hoped it was possible to create a solution that didn't use regex (as it's generally a bad tool for such tasks) but I accept it might not be possible. – Ram Rachum May 22 '13 at 09:53
  • However, your solution fails for `"Foo(x=y,\n**kwargs)"`. – Ram Rachum May 22 '13 at 09:54
  • The ending number might be better if it was one higher -- then it could be used directly as a string `slice`, or `range` parameters. – Ethan Furman May 22 '13 at 16:55
  • Fails for `"f(1, len(document_text) - 1 - position)"`. – Ram Rachum May 22 '13 at 17:02
  • This is getting to be a real hack. It now works for `f(1, len(document_text) - 1 - position)` though. I have also took into account the remark of @EthanFurman. – utapyngo May 23 '13 at 02:39
  • It seems like, Python 3.5+ has removed the `starargs` and `kwargs`. So this code will not run in Python3.5+ as per I saw. – SRC Jan 08 '20 at 22:01
-1

You may want to get the abstract syntax tree for a function call of your function.

Here is a python recipe to do so, based on ast module.

Python's ast module is used to parse the code string and create an ast Node. It then walks through the resultant ast.AST node to find the features using a NodeVisitor subclass.

Function explain does the parsing. Here is you analyse your function call, and what you get

>>> explain('mymod.nestmod.func("arg1", "arg2", kw1="kword1", kw2="kword2",
         *args, **kws')
    [Call(  args=['arg1', 'arg2'],keywords={'kw1': 'kword1', 'kw2': 'kword2'},
      starargs='args', func='mymod.nestmod.func', kwargs='kws')]
kiriloff
  • 25,609
  • 37
  • 148
  • 229
  • 1
    I don't see how that helps. For example, if one of your arguments is a function call itself, how do you know its start position and end position? – Ram Rachum May 19 '13 at 20:08
-1

If I understand correctly, from your example you want something like:

--> arguments("whatever(foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowkwds)")
{
  'foo': slice(9, 12),
  'baz()': slice(14, 19),
  '24+2': slice(21, 29),
  'meow=3': slice(32, 38),
  '*meowargs': slice(41, 50),
  '**meowkwds': slice(53, 63),
}

Note that I changed the name of your last argument, as you can't have two arguments with the same name.

If this is what you want then you need to have the original string in question (shouldn't be a problem if your building an IDE), and you need a string parser. A simple state machine should do the trick.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • I appreciate you trying to help me, but when you say "A simple state machine should do the trick", I don't know what you mean and how to build that so it'll actually work and return the results I want. (And yes, I have the string.) – Ram Rachum May 22 '13 at 09:49
  • (I did learn about state machines in high school, but the way from that to a working solution is not clear to me.) – Ram Rachum May 22 '13 at 15:07
  • @RamRachum: They're not too hard -- having said that, I've never actually implemented one. I'll see if I can't make time to get one together. – Ethan Furman May 22 '13 at 16:54
  • @EthanFurman: I did. It is not too hard, but as you will have to parse a considerable part of Python grammar, it can get tedious. I'd use `pyparsing` or `ply` instead. – utapyngo May 23 '13 at 05:26