1

I would like a function that, given a name which caused a NameError, can identify Python packages which could be imported to resolve it.

That part is fairly easy, and I've done it, but now I have an additional problem: I'd like to do it without causing side-effects. Here's the code I'm using right now:

def necessaryImportFor(name):
    from pkgutil import walk_packages
    for package in walk_packages():
        if package[1] == name:
            return name
        try:
            if hasattr(__import__(package[1]), name):
                return package[1]
        except Exception as e:
            print("Can't check " + package[1] + " on account of a " + e.__class__.__name__ + ": " + str(e))
    print("No possible import satisfies " + name)

The problem is that this code actually __import__s every module. This means that every side-effect of importing every module occurs. When testing my code I found that side-effects that can be caused by importing all modules include:

  • Launching tkinter applications
  • Requesting passwords with getpass
  • Requesting other input or raw_input
  • Printing messages (import this)
  • Opening websites (import antigravity)

A possible solution that I considered would be finding the path to every module (how? It seems to me that the only way to do this is by importing the module then using some methods from inspect on it), then parsing it to find every class, def, and = that isn't itself within a class or def, but that seems like a huge PITA and I don't think it would work for modules which are implemented in C/C++ instead of pure Python.

Another possibility is launching a child Python instance which has its output redirected to devnull and performing its checks there, killing it if it takes too long. That would solve the first four bullets, and the fifth one is such a special case that I could just skip antigravity. But having to start up thousands of instances of Python in this single function seems a bit... heavy and inefficient.

Does anyone have a better solution I haven't considered? Is there a simple way of just telling Python to generate an AST or something without actually importing a module, for example?

ArtOfWarfare
  • 20,617
  • 19
  • 137
  • 193
  • Why are you trying to do this in the first place? Matching a name against an arbitrary module is miles away from actually providing the intended object. Potentially an infinity of libraries can provide the same name with just as many different uses and signatures. – Martijn Pieters Mar 14 '15 at 13:23
  • You can load an AST from any Python file without importing, just use the [`ast` module](https://docs.python.org/2/library/ast.html) and load the file source by a different means from importing (e.g. resolve the module file yourself and open it). This is of course limited to Python files only; C extensions cannot be parsed like this and would require loading anyway. – Martijn Pieters Mar 14 '15 at 13:24
  • @MartijnPieters: This is for my `sys.excepthook` which I set up in my python profile. In addition to showing my stack trace, when the exception is a `NameError`, I'd like it to let me know what `import`s I could add to resolve the `NameError`. I've actually been using the code as I posted it here for a few days and I feel it has made me more productive on one machine that I use (that machine has a lot fewer packages with side-effects from import). On other machines (with more packages with more side-effects) it's not as helpful. Similar code could be useful in PyLint or other code checkers. – ArtOfWarfare Mar 14 '15 at 13:54
  • Code checkers use AST parsing for such tasks. – Martijn Pieters Mar 14 '15 at 13:56
  • @MartijnPieters: The `ast` module appears to require that I know where the file is. Is there some variant of `walk_packages`, or some function I could use in conjunction with it, that would yield the path to the package instead of just its name? Or do I need to write my own function that checks all the same places that `import` does? – ArtOfWarfare Mar 14 '15 at 13:59
  • Take a look at the [`imp` module](https://docs.python.org/2/library/imp.html); it provides utility functions for searching the module search path. – Martijn Pieters Mar 14 '15 at 14:14
  • @MartijnPieters: Okay, so I loop over the results of `walk_packages`, pass each one to `imp.find_module`, then use that in `ast`, if the file is purely Python. I'll probably rely on my existing implementation as a fall-back if it's not Python, since I think all of my side-effects occur in pure Python files. – ArtOfWarfare Mar 14 '15 at 14:31
  • @MartijnPieters: Put this on hold for 6 weeks since it seemed like too much work for too little purpose. But then I had another scenario where I needed the exact same problem come up, with a much greater purpose, so I revisited and answered this. You can see the solution I came up with below. Thanks for your pointers. Next up: I'm writing a static source analyzer that should be able to catch `TypeError`s, `NameError`s, and `AttributeError`s, without running the code. Let me know if you know of such a tool already - I couldn't find one. – ArtOfWarfare Apr 26 '15 at 20:03

1 Answers1

3

So I ended up writing a few methods which can list everything from a source file, without importing the source file.

The ast module doesn't seem particularly well documented, so this was a bit of a PITA trying to figure out how to extract everything of interest. Still, after ~6 hours of trial and error today, I was able to get this together and run it on the 3000+ Python source files on my computer without any exceptions being raised.

def listImportablesFromAST(ast_):
    from ast import (Assign, ClassDef, FunctionDef, Import, ImportFrom, Name,
                     For, Tuple, TryExcept, TryFinally, With)

    if isinstance(ast_, (ClassDef, FunctionDef)):
        return [ast_.name]
    elif isinstance(ast_, (Import, ImportFrom)):
        return [name.asname if name.asname else name.name for name in ast_.names]

    ret = []

    if isinstance(ast_, Assign):
        for target in ast_.targets:
            if isinstance(target, Tuple):
                ret.extend([elt.id for elt in target.elts])
            elif isinstance(target, Name):
                ret.append(target.id)
        return ret

    # These two attributes cover everything of interest from If, Module,
    # and While. They also cover parts of For, TryExcept, TryFinally, and With.
    if hasattr(ast_, 'body') and isinstance(ast_.body, list):
        for innerAST in ast_.body:
            ret.extend(listImportablesFromAST(innerAST))
    if hasattr(ast_, 'orelse'):
        for innerAST in ast_.orelse:
            ret.extend(listImportablesFromAST(innerAST))

    if isinstance(ast_, For):
        target = ast_.target
        if isinstance(target, Tuple):
            ret.extend([elt.id for elt in target.elts])
        else:
            ret.append(target.id)
    elif isinstance(ast_, TryExcept):
        for innerAST in ast_.handlers:
            ret.extend(listImportablesFromAST(innerAST))
    elif isinstance(ast_, TryFinally):
        for innerAST in ast_.finalbody:
            ret.extend(listImportablesFromAST(innerAST))
    elif isinstance(ast_, With):
        if ast_.optional_vars:
            ret.append(ast_.optional_vars.id)
    return ret

def listImportablesFromSource(source, filename = '<Unknown>'):
    from ast import parse
    return listImportablesFromAST(parse(source, filename))

def listImportablesFromSourceFile(filename):
    with open(filename) as f:
        source = f.read()
    return listImportablesFromSource(source, filename)

The above code covers the titular question: How do I check the contents of a Python package without running it?

But it leaves you with another question: How do I get the path to a Python package from just its name?

Here's what I wrote to handle that:

class PathToSourceFileException(Exception):
    pass

class PackageMissingChildException(PathToSourceFileException):
    pass

class PackageMissingInitException(PathToSourceFileException):
    pass

class NotASourceFileException(PathToSourceFileException):
    pass

def pathToSourceFile(name):
    '''
    Given a name, returns the path to the source file, if possible.
    Otherwise raises an ImportError or subclass of PathToSourceFileException.
    '''

    from os.path import dirname, isdir, isfile, join

    if '.' in name:
        parentSource = pathToSourceFile('.'.join(name.split('.')[:-1]))
        path = join(dirname(parentSource), name.split('.')[-1])
        if isdir(path):
            path = join(path, '__init__.py')
            if isfile(path):
                return path
            raise PackageMissingInitException()
        path += '.py'
        if isfile(path):
            return path
        raise PackageMissingChildException()

    from imp import find_module, PKG_DIRECTORY, PY_SOURCE

    f, path, (suffix, mode, type_) = find_module(name)
    if f:
        f.close()
    if type_ == PY_SOURCE:
        return path
    elif type_ == PKG_DIRECTORY:
        path = join(path, '__init__.py')
        if isfile(path):
            return path
        raise PackageMissingInitException()
    raise NotASourceFileException('Name ' + name + ' refers to the file at path ' + path + ' which is not that of a source file.')

Trying the two bits of code together, I have this function:

def listImportablesFromName(name, allowImport = False):
    try:
        return listImportablesFromSourceFile(pathToSourceFile(name))
    except PathToSourceFileException:
        if not allowImport:
            raise
        return dir(__import__(name))

Finally, here's the implementation for the function that I mentioned I wanted in my question:

def necessaryImportFor(name):
    packageNames = []

    def nameHandler(name):
        packageNames.append(name)

    from pkgutil import walk_packages
    for package in walk_packages(onerror=nameHandler):
        nameHandler(package[1])
    # Suggestion: Sort package names by count of '.', so shallower packages are searched first.
    for package in packageNames:
        # Suggestion: just skip any package that starts with 'test.'
        try:
            if name in listImportablesForName(package):
                return package
        except ImportError:
            pass
        except PathToSourceFileException:
            pass
    return None

And that's how I spent my Sunday.

ArtOfWarfare
  • 20,617
  • 19
  • 137
  • 193
  • Thanks for posting this. A note for people from the future: `TryExcept` and `TryFinally` appear to be deprecated (if you want to use the code immediately, without digging to find an alternate, just remove the relevant lines). I did a quick google search for PEPs involving `ast` but didn't find any that referenced the `Try`s specifically. – Reid Ballard Dec 04 '18 at 02:07