0

Overview

I'm running some scientific simulations and I want to process the resulting data in Python. The simulation produces a custom data type that is not used outside of the chain of programs that the authors of the simulation produced, so unfortunately I need what they provide me.

They want me to install two files:

  • A module called sdds.py that defines a class that provides all user functions and two demos
  • A compiled module called sddsdatamodule.so that only provides helper functions to sdds.py.

(I find it strange that they're offering me two modules that are so inextricably connected, it doesn't seem like good coding practice to me, but using their code is probably better than rewriting things from scratch.) I'd prefer not to install them directly into my path, side by side. They come from the same company, they're designed to do one specific task together: access and manipulate SDDS-type files.

So I thought I would put them in a package. I could install that on my path, it would be self-contained, and I could easily find and uninstall or upgrade the modules from one location. Then I could hide their un-Pythonic solution in a more-Pythonic package without significantly rewriting things. Seems elegant.

Details

The package I actually use is found here:

http://www.aps.anl.gov/Accelerator_Systems_Division/Accelerator_Operations_Physics/software.shtml#PythonBinaries

Unfortunately, they only support Windows and Mac OS X right now. Compiling the source code is quite onerous, and apparently they have no significant requests for Linux/Unix. I have a Mac, so thankfully this isn't a problem for me.

So my directory tree looks like this:

SDDSPython/                   My toplevel package
    __init__.py               Designed to only import the SDDS class
    sdds.py                   Defines SDDS class and two demo methods
    sddsdatamodule.so         Defines sddsdata module used by SDDS class.

My __init__.py file literally only contains this:

from sdds import SDDS

The sdds.py file contains the class definition and the two demo definitions. The only other code in the sdds.py file is:

import sddsdata, sys, time

class SDDS:
    (lots of code here)

def demo(output):
    (lots of code here)

def demo2(output):
    (lots of code here)

I can then import SDDSPython and check, using dir:

>>> import SDDSPython
>>> dir(SDDSPython)
['SDDS', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', 'sdds', 'sddsdata']

So I can now access the SDDS class via SDDSPython.SDDS

Question

How on earth did SDDSPython.sdds and SDDSPython.sddsdata get loaded into the SDDSPython namespace??

>>> SDDSPython.sdds
<module 'SDDSPython.sdds' from 'SDDSPython/sdds.pyc'>
>>> SDDSPython.sddsdata
<module 'SDDSPython.sddsdata' from 'SDDSPython/sddsdatamodule.so'>

I thought by creating an __init__.py file I was specifically excluding the sdds and sddsdata modules from being loaded into the SDDSPython namespace. What is going on? I can only assume this is happening due to something in the sddsdatamodule.so file? But how can a module affect its parent's namespace like that? I'm rather lost, and I don't know where to start. I've looked at the C code, but I don't see anything suspicious. To be fair- I probably don't know what something suspicious would look like, I'm probably not familiar enough with programming C extensions for Python.

Joel
  • 2,065
  • 2
  • 19
  • 30

2 Answers2

0

Curious question--I did some investigation for you using a similar test case.

XML/
    __init__.py       -from indent import XMLIndentGenerator
    indent.py         -contains class XMLIndentGenerator, and Xml
    Sink.py      

It appears that importing a class from a module, even though you are importing just a portion, the entire module is accessible in the way you described, that is:

>>>import XML
>>>XML.indent
<module 'XML.indent' from 'XML\indent.py'>
>>>XML.indent.Xml   #did not include this in the from
<class 'XML.indent.Xml'>
>>>XML.Sink
Traceback (most recent call last):
AttributeError:yadayada no attribute 'Sink'

This is expected, since I did not import Sink in __init__.py.....BUT!

I added a line to indent.py:

import Sink

class XMLIndentGenerator(XMLGenerator):
    (code)

Now, since this class imports a module contained within the XML package, if i do:

>>>import XML
>>>XML.Sink
<module 'XML.Sink' from 'XML\Sink.pyc'>

So, it appears that because your imported sdds module also imports sddsdata, you are able to access it. That answers the "How" portion of your question, but "why" this is the case, I'm sure there's an answer somewhere in the docs :)

I hope this helps - I was literally doing this as I was typing the answer! A learning experience for me as well.

tenwest
  • 2,058
  • 1
  • 13
  • 16
  • Thanks for the grunt work on this! I can see how that happens now, and I really would like to know why too. – Joel Jan 29 '15 at 20:36
0

This happens because python imports don't work the way you might think. They work like this:

  • the import machinery looks for a file that should be the module requested from the import
  • a types.ModuleType instance is created, several attributes on it are set to the corresponding file (__file__, __name__ and so on), and that object is inserted into sys.modules under the fully qualified module name it would have.
  • if this is a submodule import (ie, sdds.py which is a submodule in SDDSPython), the newly created module is attached as an attribute to the existing python module of the parent package.
  • the file is "executed" with that module as its global scope; all names defined by that file appear as attributes of the module.
  • in the case of a from import, an attribute from the module may be returned to the importing script.

So that means if I import a module (say, foo.py) that has, as its source only:

import bar

then there is a global in foo, called bar, and I can access it as foo.bar.

There is no capacity in python for "only execute the part of this python script i want to use right now." The whole thing runs.

SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
  • Right, but in my case, I did `from sdds import SDDS` in `__init__.py`. If I did, instead, `import sdds`, I would expect to get `dir(SDDSPython)` returning only `sdds`, but `dir(SDDSPython.sdds)` should include `sddsdata`, `sys`, and `time`. `from sdds import SDDS` should, I thought anyways, only import the class from the sdds submodule. But somehow it's acting as if I also did `import sdds, sddsdata`. That `from sdds import SDDS` isn't supposed to do a submodule import, right? – Joel Jan 29 '15 at 20:35
  • 1
    `SDDSPython` is a package on your path, and `sdds` is in it. when you import `sdds`, no matter how you've done it, as long as python finds it as a sub-module, that submodule is unconditionally added as an attribute to its containing package. this is done by the import machinery, independent of what *else* your `__init__.py` does. – SingleNegationElimination Jan 29 '15 at 20:43
  • Ohhhhhhh I see. Is there documentation about this somewhere? – Joel Jan 29 '15 at 21:28