0

I have a Python module that is built around a native extension written in C. This extension includes code generated using the GNU Bison and (not GNU) Flex tools. That means the build process for my C extension involves calling these tools and then including their outputs (C source files) in the extension sources.

To get this to work when calling python setup.py install, I extended the setuptools.command.build_ext class to call both Flex and Bison and then add the generated source to the Extension source before calling the super class run method.

This means my setup.py looks like:

import os
from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext

c_extension = Extension('_mymod',
               include_dirs = ['inc'],
               sources = [
                          os.path.join('src', 'lib.c'),
                          os.path.join('src', 'etc.c')
                         ])

class MyBuild(build_ext):
    def run(self):
        parser_dir = os.path.join(self.build_temp, 'parser')
        # add the parser directory to include_dirs
        self.include_dirs.append(parser_dir)
        # add the source files to the sources
        self.extensions[0].sources.extend([os.path.join(parser_dir, 'lex.yy.c'), os.path.join(parser_dir, 'parse.tab.c')])
        
        # honor the --dry-run flag
        if not self.dry_run:
            self.mkpath(parser_dir)

            os.system('flex -o ' + os.path.join(parser_dir, 'lex.yy.c') + ' ' + os.path.join('src', 'lex.l'))
            os.system('bison -d -o ' + os.path.join(parser_dir, 'parse.tab.c') + ' ' + os.path.join('src', 'parse.y'))

        # call the super class method
        return build_ext.run(self)

setup (name = 'MyMod',
       version = '0.1',
       description = 'A module that uses external code generation tools',
       author = 'Sean Kauffman',
       packages = ['MyMod'],
       ext_modules = [c_extension],
       cmdclass={'build_ext': MyBuild},
       python_requires='>=3',
       zip_safe=False)

Now, however, I am trying to package this module for distribution and I have a problem. Either users who want to install my package need Bison and Flex installed, or I need to run these tools when I build the source distribution. I see two possible solutions:

  1. I validate the flex and bison are in the system execution PATH. This keeps the custom builder as-is. I have found no documentation that implies I can validate that system files exist like bison and flex. The closest is using the libraries field of the Extension, but it seems I would need some real hackery to check the entire PATH for executables. I haven't tried this yet because my first choice would be option 2.
  2. I move code generation to occur when the source distribution is created. This means the source distribution will contain the output files from bison and flex so people installing the package don't need these tools. This seems like the cleaner option. I have tried extending the sdist command instead of build_ext like above, but it isn't clear how I can add the generated files to the MANIFEST so they are included. Furthermore, I want to ensure that it still works to build using python setup.py install, but I don't think this command will run sdist before building.

It's fine for any solution to only work on Linux and OS X.

seanmk
  • 1,934
  • 15
  • 28

1 Answers1

1

The usual solution for distributing code requiring (f)lex and bison/yacc is to bundle the generated scanner and parser, but be prepared to generate them if they are not present. The second part makes development a little easier and also gives people the option of using their own flex/bison version if they feel they have a good reason to do so. I suppose this advice would also apply to Python modules.

(IANAL but my understanding is that there is a licence exception for the code generated by bison, making it possible to distribute even in non-GPL projects. Flex is not GPL to start with, and afaik there are no distribution restrictions.)

To conditionally build the scanner and parser in a source distribution, you could use the code you have already provided, after verifying that the generated files don't exist. (Ideally, you would check that the generated files don't exist or are newer than the respective source file. That depends on the file dates not being altered on their voyage through an archive. That will work fine on Linux and OS X but it might not be completely portable.)

The assumption is that the package is built before executing the sdist command. sdist should normally exclude object files built in the source tree, so it shouldn't be necessary to manually clean the source. However, if you wanted to ensure that the generated files were present when you execute sdist, you could override it in your setup.py the same way you override build_ext, invoking bison and flex prior to calling the base sdist command.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the suggestion. The licenses are definitely compatible. Can you comment on _how_ one would bundle the generated code but still generate it if it isn't present? How should the code to call flex/bison be integrated into the setup.py file? – seanmk Nov 03 '20 at 19:30
  • @seanmk: I'm far from expert on distutils but I tried to address that issue in an edit to the answer. Sorry for skipping over it the first time. – rici Nov 04 '20 at 03:43
  • This seems to do the trick: override both build_ext and sdist and have both generate the files if they are missing or outdated. One important trick: I discovered that the *build directory is automatically removed by setuptools after MANIFEST.in is parsed*, so the generated code needs to go somewhere _other than in build_. – seanmk Nov 05 '20 at 18:14
  • @seanmk: Ah, that's good to know. I'll fix the answer with that information. – rici Nov 05 '20 at 18:33