1

Q: In creating a python distribution using setup.py and MANIFEST.IN, how can I define define the nested data directories that I do and don't want in the final installation directory (Complex example!)

Background: My program has a set of data directories (not source directories). Within each of these main directories, is are some subdirectories with user specific names. In my setup.py, I want to exclude my own data directories, while still including the other subdirectories that all users should have access to.

The file tree AS IT CURRENTLY EXISTS in my Pycharm DEVELOPMENT environment:

  PycharmProjects
     pythonProject
         data_files_directory_1
            subdirectory_to_be_EXcluded
                 data_file_to_be_EXcluded.txt
            subdirectory_to_be_INcluded
                  data_file_to_be_INcluded.txt
             index.html
         data_files_directory_2
            subdirectory_to_be_EXcluded
                 data_file_to_be_EXcluded.txt
            subdirectory_to_be_INcluded
                  data_file_to_be_INcluded.txt
             index.html
         src
             __init__.py
             constants.py
             helper1.py
             helper2.py
             main.py

Expected result:

The file tree I WANT AFTER INSTALLATION on target machine:

  PycharmProjects
     pythonProject
         data_files_folder_1
            subdirectory_to_be_INcluded
                  data_file_to_be_INcluded.txt
         data_files_folder_2
            subdirectory_to_be_INcluded
                  data_file_to_be_INcluded.txt
             index.html
         src
             __init__.py
             constants.py
             helper1.py
             helper2.py
             main.py

Actual result:

  PycharmProjects
     pythonProject
         data_files_directory_1
            subdirectory_to_be_EXcluded
                 data_file_to_be_EXcluded.txt
            subdirectory_to_be_INcluded
                  data_file_to_be_INcluded.txt
             index.html
         data_files_directory_2
            subdirectory_to_be_EXcluded
                 data_file_to_be_EXcluded.txt
            subdirectory_to_be_INcluded
                  data_file_to_be_INcluded.txt
             index.html
         src
             __init__.py
             constants.py
             helper1.py
             helper2.py
             main.py

What I tried / Source code:

MANIFEST.IN

...
graft data_files_directory_1
graft data_files_directory_2
...

setup.py

setup(
    ...
    # include everything in MANIFEST.IN:
    include_package_data=True, 
    # ...but exclude just these directories */subdirectory_to_be_EXcluded/* from all packages
    exclude_package_data={"": ["*/subdirectory_to_be_EXcluded/*"]},
    ...
)

PROBLEM: As you can see, the exclusion request is being ignored.

I must confess that after heavy use of Google, YouTube and PyCharm documentation on setup.py and installers that I'm not really clear what the correct way is to include and exclude NON-source directories and files. It seems like many of the possible solutions are deprecated!

What is the correct way to do this?

Can someone point me at some good working examples?

mcgregor94086
  • 1,467
  • 3
  • 11
  • 22
  • Have you tried [deleting stale files/folders](https://stackoverflow.com/a/26547314/7586861) that don't get re-built after each setup compilation? And then re-running the setup command after? Some to try would be any `*.egg-info` folders and maybe even `__pycache__` folders. – jarcobi889 Aug 18 '20 at 22:44
  • 1
    Thank you for your answer! Deleting stale files/folders (such as build and dist folders and any *.egg-info alone did not solve my problem. But because you shared a link to another SO post I read all its answers, and by combining some of them, I now have the correct files included and excluded. – mcgregor94086 Aug 20 '20 at 03:00
  • 1
    The extra bit I had to add was to a one line change in setup.py: packages=find_packages() => packages=find_packages(exclude=["*/subdirectory_to_be_EXcluded/*"]) – mcgregor94086 Aug 20 '20 at 03:04
  • 1
    and in the MANIFEST.IN file I replaced the lines graft data_files_directory_1, and graft data_files_directory_2 with: graft data_files_directory_1/subdirectory_to_be_INcluded, and data_files_directory_2/subdirectory_to_be_INcluded, and then I also added: include data_files_directory_1/index.html, and include data_files_directory_2/index.html. – mcgregor94086 Aug 20 '20 at 03:15
  • 1
    I am going to write this up as a second answer that is formatted better than I can do in the comments, but I want to acknowledge that it was the answer by @jarcobi889 and the link he posted that really helped! – mcgregor94086 Aug 20 '20 at 03:17

1 Answers1

1

Here is the solution that eventually worked.

I did remember to delete the old build and dist directories and I also made sure to delete all *.egg-info files as suggested by @jarcobi. But clearing out all the stale files alone was not enough to solve the problem.

What finally worked was to edit setup.py thusly:

setup(
    ...
    packages=find_packages(exclude=["*/subdirectory_to_be_EXcluded/*"]),
    # include everything in MANIFEST.IN:
    include_package_data=True, 
    # ...but exclude just these directories */subdirectory_to_be_EXcluded/* from all packages
    exclude_package_data={"": ["*/subdirectory_to_be_EXcluded/*"]},
    ...
)

and to edit MANIFEST.IN thusly:

...
graft data_files_directory_1/subdirectory_to_be_INcluded
include data_files_directory_1/index.html
graft 

data_files_directory_2/subdirectory_to_be_INcluded include data_files_directory_2/index.html ...

And now I am getting the desired file tree.

Additional remarks: I'm actually still unclear exactly why these specific changes worked but other solutions I tried did not. But I'm now able to progress with my installation so I guess that is good enough, and I am closing this out as solved.

A request: I would like to put out my request to those in the community who write up documentation, how to guides, or who create instructional videos to come out less ambiguous and confusing explanations with many more working Cookbook examples and explanations.

Areas for improvement: For me, one specific area where I am constantly confused, is where one document says an operator operates on "packages" while another indicates the operator works on "directories".

The confusion is exacerbated because sometimes the word "package" is used to mean "only directories that have a init.py file in them".

That word choice of "package" would seem to indicate that those operators would be inappropriate for use any data directories that do not contain a init.py file.

And indeed, in some cases operators do seem limited to only python package directories. Yet some operators do appear to work on any subdirectory even those that do not contain a init.py file.*. Yet some authors refer to them operating on "packages" when "directories" would be less misleading.

Finally, on top of this, "package" can sometime just mean the install tar.gz files created by setup.py sdist, or the .whl files created by setup.py bdist_wheel.

Anyone who could create an authoritative explanation of which setuptools or MANIFEST.IN operators work (and which don't work) on any directory, and which only work on directories with init.py files.

Dear Reader, Are you are our hero?

Anyone who attempts such explanations and successfully avoids falling into this confusing thicket of multiple different meanings of "package" would be doing quite a valuable service to the community.

Are you that hero in the wings ready to take up that Herculean Labor?

mcgregor94086
  • 1,467
  • 3
  • 11
  • 22