1

I have a project with the following structure:

myproject/
    data/
        <some directories>
    src/
        myproject/
            <python code>
    tests/
        <some tests>
    setup.py
    config.ini
    pyproject.toml
    requirements.txt

I'd like to reference the files in the data/ directory from within my project, without including them with the source distribution. I think that the best way to do that is with a configuration file (via ConfigParser, though I suppose you could use YAML or something instead).

Python's distutils/setuptools seems to offer a solution for this with the data_files option in setuptools.setup(). The way I understand it, the data_files option allows access to arbitrary files outside of the package by copying them to a location on the system (i.e. sys.prefix). This seems ideal for configuration files, which may be edited by a user prior to installation.

However, since the file is copied to some unknown location, it breaks any relative paths specified in config.ini. So, how do you deal with that?

I guess that could require absolute paths in the config file, but that's not very user-friendly.

Is there a better way to access data files which are specified via a configuration file from within a package?

Chris Hubley
  • 311
  • 2
  • 9

1 Answers1

0

You can use pkg_resources:

import pkg_resources as pk
fiber_file=pk.resource_filename( "cytolysis" , 'example_data/fiber_points.txt')

And in your setuptools setupfile :

packages=['cytolysis','cytolysis.example_data'],
package_dir={'cytolysis': 'src', 'cytolysis.example_data': 'example_data'},
package_data={'cytolysis': ['*.md'], 'cytolysis.example_data': ['*.txt', '*.cym']},
SergeD
  • 44
  • 9
  • Doesn't package_data / pkg_resources require that the data is distributed with sdist? I was hoping that the configuration file would be separate from the source code. – Chris Hubley Sep 20 '21 at 16:02