2

I'm using TensorFlow under inside an x64_64 environment, but the processor is an Intel Atom processor. This processor lacks the AVX processor extension and since the pre-built wheels for TensorFLow are complied with the AVX extension TensorFLow does not work and exits. Hence I had to build my own wheel and I host it on GitHub as a released file.

The problem I have is to download this pre-built wheel only in an Atom based processor. I was able to achieve this previously using a setup.py file where this can be easily detected, but I have migrated to pyproject.toml which is very poor when it comes to customization and scripted installation support.

Is there anything similar in addition to platform_machine=='x86_64' which checks for the processor type? Or has the migration to pyproject.toml killed here my flexibility?

The current requirements.txt is:

confluent-kafka @ https://github.com/HandsFreeGadgets/python-wheels/releases/download/v0.1/confluent_kafka-1.9.2-cp38-cp38-linux_aarch64.whl ; platform_machine=='aarch64'
tensorflow @ https://github.com/HandsFreeGadgets/python-wheels/releases/download/v0.1/tensorflow-2.8.4-cp38-cp38-linux_aarch64.whl ; platform_machine=='aarch64'
tensorflow-addons @ https://github.com/HandsFreeGadgets/python-wheels/releases/download/v0.1/tensorflow_addons-0.17.1-cp38-cp38-linux_aarch64.whl ; platform_machine=='aarch64'
tensorflow-text @ https://github.com/HandsFreeGadgets/python-wheels/releases/download/v0.1/tensorflow_text-2.8.2-cp38-cp38-linux_aarch64.whl ; platform_machine=='aarch64'
rasa==3.4.2
SQLAlchemy==1.4.45
phonetics==1.0.5
de-core-news-md @ https://github.com/explosion/spacy-models/releases/download/de_core_news_md-3.4.0/de_core_news_md-3.4.0-py3-none-any.whl

For platform_machine=='aarch64' I need something similar for x86_64 but only executed on Atom processor environments.

The old setup.py was:

import platform
import subprocess
import os

from setuptools import setup


def get_requirements():
    requirements = []

    if platform.machine() == 'x86_64':
        command = "cat /proc/cpuinfo"
        all_info = subprocess.check_output(command, shell=True).strip()
        # AVX extension is the missing important information
        if b'avx' not in all_info or ("NO_AVX" in os.environ and os.environ['NO_AVX']):
            requirements.append(f'tensorflow @ file://localhost/'+os.getcwd()+'/pip-wheels/amd64/tensorflow-2.3.2-cp38-cp38-linux_x86_64.whl')
    elif platform.machine() == 'aarch64':
       ...
    requirements.append('rasa==3.3.3')
    requirements.append('SQLAlchemy==1.4.45')
    requirements.append('phonetics==1.0.5')
    requirements.append('de-core-news-md @ https://github.com/explosion/spacy-models/releases/download/de_core_news_md-3.4.0/de_core_news_md-3.4.0-py3-none-any.whl')
    return requirements


setup(
    ...
    install_requires=get_requirements(),
    ...
)

The line if b'avx' not in all_info or ("NO_AVX" in os.environ and os.environ['NO_AVX']) does the necessary differentiation.

If a pyproject.toml approach is not for my needs, what is recommended for Python with more installation power which is not marked as legacy? Maybe there is something similar for Python what is Gradle for building projects in the Java world, which was introduced to overcome the XML limitations and providing a complete scripting language which I'm not aware of?

k_o_
  • 5,143
  • 1
  • 34
  • 43
  • 1
    Maybe show the relevant parts of your `requirements.txt` file, your old `setup.py`, and your new `pyproject.toml`. -- ***1.*** "*Is there anything similar in addition to platform_machine=='x86_64' which checks for the processor type?*" Not as far as I know. -- ***2.*** "*which is very poor when it comes to customization and scripted installation support*" Indeed, there is no support at all for scripting or customization of installation process, on purpose. – sinoroc Feb 06 '23 at 13:41
  • @sinoroc: I have added the requested files. – k_o_ Feb 06 '23 at 18:31
  • If scripts could be provided by packages and then used to decide which package to install, then any package that was _under consideration for installation_ could install malware, even if it wasn't eventually chosen. This would be an Extremely Bad Idea. – Charles Duffy Feb 06 '23 at 18:46
  • Better to install _both_ dependencies and pick which of them to use at runtime. – Charles Duffy Feb 06 '23 at 18:47
  • OK, I see now, it is clearer what the goal is. -- I do not know of a simple way to solve this. -- Maybe use conda, if I understood correctly it seems to be better at this sort of things. -- I think if I were you I would use `pyproject.toml` and declare `tensorflow` as [dependency according to the standard specification](https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#dependencies-optional-dependencies). And then I would create `requirements.txt` files for the details (such as which index and/or which wheels to use), maybe a dedicated `requirements-atom.txt`. – sinoroc Feb 06 '23 at 18:50
  • @CharlesDuffy: How should this work to install both tensorflow == tensorflow? The dependencies are exactly the same, expect for the binary, they would overwrite each other. – k_o_ Feb 06 '23 at 18:51
  • @k_o_, ...with pip? _shrug_. I could do it with [Nix](https://nixos.org/); put them in a different location and then update `sys.path` at runtime after checking CPU details. Or just update the package to be under a different top-level name. – Charles Duffy Feb 06 '23 at 18:55
  • @CharlesDuffy: I'm aware that installation script could include malware, but the same is the true for the actual program. And all installers are doing this: Debian/Ubuntu apt, Fedoro/Redhat yum, pacman, Homebrew, Windows installers. There have been already Debian packages by accident executing `rm -rf` in the root folder. And if the binaries and the installer is coming from the same source why not trust both and only the binary? In fact the complete PyPy, nodejs, Go, etc environment is very risky. No signatures and checks for the identity of maintainers are done. – k_o_ Feb 06 '23 at 18:58
  • @sinoroc: Adding a different requirements.txt file sounds interesting. Is there a way to pass it when calling `pip install `? – k_o_ Feb 06 '23 at 19:03
  • Re: apt/yum/pacman/etc, their pre- and post-scripts only happen at install time, **not** at dependency resolution time. None of them allow arbitrary code to be run to detect resolutions. (I used to do porting and packaging professionally; I've built debian rules files, rpm specs, &c as my day job). – Charles Duffy Feb 06 '23 at 19:06
  • Think about it: If you have A->B->(runtime-code)->C, and some A declares a conflict with C, then you can have a situation where the version of B that includes that runtime code will never be chosen, **but** the code that package B includes has to be invoked before you _know_ that B will never be chosen. – Charles Duffy Feb 06 '23 at 19:09
  • @CharlesDuffy: Now I understand the difference you were referring to with dependency resolution time, but still the pre- and post-scripts can be downloaded to fetch additional binaries. Independent of this what I might have to request is a new qualifier or a set of new qualifiers for the requirements.txt. Unfortunately Python become a lot processor dependent. – k_o_ Feb 06 '23 at 19:10
  • As far as I know, yes it is possible to install from a remote requirements file via its URL: `python -m pip install --requirement https://server.tld/path/requirements-atom.txt`. – sinoroc Feb 06 '23 at 19:58
  • @sinoroc: Thanks. Then I will try this and this could be then the answer in this thread. – k_o_ Feb 06 '23 at 20:13

1 Answers1

2

My recommendation would be to migrate pyproject.toml as intended. I would declare dependencies such as tensorflow according to the standard specification for dependencies but I would not use any direct references at all.

Then I would create some requirements.txt files in which I would list the dependencies that need special treatment (no need to list all dependencies), for example those that require a direct reference (and/or a pinned version). I would probably create one requirements file per platform, for example I would create a requirements-atom.txt.

As far as I know it should be possible to instruct pip to install from a remote requirements file via its URL. Something like this:

python -m pip install --requirement 'https://server.tld/path/requirements-atom.txt'

If you need to create multiple requirements.txt files with common parts, then probably a tool like pip-tools can help.

Maybe something like the following (untested):

requirements-common.in

# Application (or main project)
MyApplication @ git+https://github.com/HandsFreeGadgets/MyApplication.git

# Common dependencies
CommonLibrary
AnotherCommonLibrary==1.2.3

requirements-atom.in:

--requirement requirements-common.in

# Atom CPU specific
tensorflow @ https://github.com/HandsFreeGadgets/tensorflow-atom/releases/download/v0.1/tensorflow-2.8.4-cp38-cp38-linux_aarch64.whl ; platform_machine=='aarch64'
pip-compile requirements-atom.in > requirements-atom.txt
sinoroc
  • 18,409
  • 2
  • 39
  • 70
  • Is the `pyproject.toml` executed at all if I call python -m pip install --requirement 'https://server.tld/path/requirements-atom.txt'. Or are there 2 calls needed? One for the basic installation and one for the a tweaks? This would mean that the atom version has to uninstall some dependencies and replace it with the pinned version. – k_o_ Feb 06 '23 at 21:26
  • You can list your application (or main project) in the requirements file (possibly with a direct reference, maybe github repository). – sinoroc Feb 06 '23 at 21:38
  • If this is a python module which is used by another Python module, this Python module has to be also aware of the different version. I.e. it forces the other Python module also to be split into 2 dependency files. Can this be simplified? – k_o_ Feb 06 '23 at 22:30
  • I do not understand your question. – sinoroc Feb 06 '23 at 22:43
  • Then let's rephrase it. The library A defining the Atom specific dependencies will be used also by a different module B. This module B is not aware of the Atom specific versions necessary and will crash. Only if the B is aware that A needs special dependencies, A will work. This gets even more difficult If a module C is using B which is using A. All modules now have to be aware of the Atom specific version. – k_o_ Feb 06 '23 at 23:52
  • I am still not following. Maybe this deserves a question of its own, I do not know. -- You can combine requirements files however you want. -- You can have as many wheel builds of the same library as you want and need, but the wheel platform tags are not rich enough to express this particular CPU feature, so you need to use different solution to pick the right wheel, and direct reference is a possibility. I recommend to not use any direct references in `pyproject.toml`, only in `requirements.txt` files. – sinoroc Feb 07 '23 at 10:20
  • Another try to rephrase it: How do I declare a dependency to this Python module from a different Python app or module? I would assume it will declare a dependency to https://server.tld/path/requirements.txt On a Atom processor it would not fetch https://server.tld/path/requirements-atom.txt automatically and will still crash. How can I prevent that other Python modules do not have to be aware of the internals? I'm not building an end application which can be installed in a described way by the user, but a module which will be included also in other modules. – k_o_ Feb 07 '23 at 22:26
  • The one doing the installation (of the application) is the one who should write the `requirements.txt`. If your project is a library, then you should only document how to install this library, maybe with an example of `requirements.txt`. But in the end the choice of how to install things exactly will be made by the one installing the application, and they can choose to do it with a `requirements.txt` or some other way. -- Specifying the `requirements.txt` of a library in the dependencies of an application is not really a thing that exists (not really possible). – sinoroc Feb 07 '23 at 23:06