How to organize python code in order to work smoothly and distribute it easily?

Question

First i have to precise that i am not developper and didn't get the luck to work in team on informatic project. So it is quite hard to get good coding practices (i try to learn from what i see on the web but it is quite messy/confused in my head).

Triying to do my best, here is my old code structure (i will explain later the problems i am encountering working back with this code):

.
├── .env
├── .git
├── .gitignore
├── README.md
├── config.py
├── data
│   ├── input
│   ├── interim_input
│   └── output
├── requirements.txt
├── tests
│   ├── batch.py
│   ├── data
│   ├── mytest.py
│   └── test_upload_twb.py
├── mypackage
│   ├── __init__.py
│   ├── generate_trad_file_2.py
│   ├── generate_twb_file_4.py
│   ├── parse_twb.py
│   ├── twb_mysql.py
│   ├── upload_trad_file_3.py
│   └── upload_twb_1.py
└── venv

Some precisions:

All sensible information (Mysql login/pwd for example) will be loaded as environnement variable with the help of .env file and pyton-dotenv package
The config.py file let me upload the needed configurations (mainly from environnement variable)

Here are my current problems:

Trying to run my old code i noticed that after making a virtual env (with python -m venv venv) i failed to pip install the dependancies with my current python version Python 3.8.3 (pip install -r requirements.txt). In fact python version is not indicated anywhere and i was wondering how you work ? Do you specify Python version in Readme.md or is there another trick (we can't specify python version in venv) ?
Is it safe to let a .env file with sensible information (i do not commit it but i was wondering how does it work in compagnies to prevent all devellopers who access the code to see all the sensible informations
I am trying to run my code and i will have to test the different function (which will break since i have to recreate the database, etc...). I was thinking to try to run them separately from command line, but it seems a best idea to persist thoses commands for next time. Since i want to run the different functions (one by .py since they achieve really different thing but correspond to ordered step in a pipeline) i was thinking of writing differents test_file. I can feel that i do not get the right approach again, so if you could advice me....

Thanks a lot in advance for your kindness.

Have you considered using pipenv? That would solve issues 1 and 2. I’m having a hard time understanding 3 are you asking how to write unittests that use the database? — pygeek, Sep 15 '20 at 09:44
I do not know about pipenv so i will give a try, thx! For 3 i should perhaps write an other topic since it is a bit confusing... i was just asking: how would you test that a program:is working the most simply ? i think unitest is really specific so i do not know if it is the best approach. — curious, Sep 15 '20 at 11:19
Ok, I’ll write an answer to address 1 and 2 then. Consider modifying your question to omit 3 if you’re going to break it out into another question — pygeek, Sep 15 '20 at 11:21
Added answer for 2, the tests should really be a separate question as it would allow for a dedicated discussion and answer to elaborate on the topic. — pygeek, Sep 15 '20 at 12:23
1. You can make your package a librar by writing an setup.py file and there you can specify your python version requirement — geckos, Sep 15 '20 at 12:56
2. Sensive information should not go in git, so if there is sensible information on .env you should not commit it. Some packages do a distinction by configuration and local configuration where the first is saved in git and the last dont — geckos, Sep 15 '20 at 12:58
3. I didnt understand your goal here. If is for pure testing you should recreate the database everytime. But if youre doing a command line tool I recomend click library — geckos, Sep 15 '20 at 13:00

pygeek · Answer 1 · 2020-09-15T12:23:23.813

Problem

Managing Python versions and dependencies for a Python project between team members.

Solution

Consider using pipenv and dotenv with a setup script.

Pipenv is an abstraction on top of venv that is more in line with what you would expect from a package manager if you have experience with ruby’s bundle or node’s npm or yarn.

It does many wonderful things, but mainly it creates a Pipfile and Pipfile.lock for you.

Pipfile

Pipfile is where dependencies, package registry and Python version are defined (replacing requirements.txt).

Pipfile.lock

Pipfile.lock is used to explicitly define package noted in the Pipfile with their versions and SHAs “locked” to prevent inadvertently upgrading a package automatically or installing a malicious package on production.

Env files

Assuming you’re already using dotenv, create an env.example file with all environment variables for your app with non-sensitive defaults. Write a script in your projects’s bin folder named setup that new engineers can run that simply copies env.example to .env as well as any other niceties to automate project setup (ie: pipenv install)

References

Pipenv: https://pipenv-fork.readthedocs.io/en/latest/

Setuptools: https://setuptools.readthedocs.io/en/latest/

Dotenv: https://saurabh-kumar.com/python-dotenv/

Tryph · Accepted Answer · 2020-09-15T12:41:20.687

1

The ideal is to make your project valid with multiple version of Python, especially if you plan to distribute your code.

For a project which will not be distributed or distributed on very controlled environment, it is acceptable to "force" a python version, but this version have to be documented somewhere.

It you package your code with setuptools for distributing it, you can indicate the supported Python versions (see this SO question).

If you plan to support multiple Python version, Tox offers possibility to run tasks (like running tests) over multiple versions of Python and/or dependancies.

2

You should definitely not share any sensible information in any config file. There are multiple ways to address the sensible information problem, but the solution is usually based on the notion of environments:

a set of config (a .env file or anything else) with very sensible information is created especially for the production environment and very few people have access to it
a set of config especially for a potential staging environment (less sensible)
a set of config on each developper machine at the developper convenience

Anyway none of those set of config should be shared. You could actually provide a example set of config with fictive information and preferably explanation of config options.

In your case you could provide and share a .env.example with this kind of content:

db_url="the database connection string"
database="the database name"
password="the database password"

config_option_1="an arbitraty config option used to illustrate"
...

3

Tests should ideally be independent from any environment and runnable without heavy setup.

That means your tests should not be bound to an existing content in a particular database. The best is to provide a database config to the tests (DB url, name, pass, etc...) and let the tests create the data they need to run in a setup step. There are existings tools to ease this process like:

pytest and its fixture concept which ease the setup/teardown for each test
factoryboy for database populating with stub data

Ideally the tests should also clean the database after they have run. Doing this you will be able to run your test on any DB you want and not worry about the DB (re)creation. The counterpart is that is represents a non negligible amount of work.

The great benefit is that anybody will be able to run the tests quite easily (the way to run the tests should also be documented).

Really clear explanation! For 2 i do not understand it quite clearly. Let's imagine that there is a team with 1 dev and 1 data architect. When the dev is coding he needs a database so i suppose he works in a dev environnement and can access sensible information from this dev environnement (like db pwd that he will need to build his app). Now the code is ready, it is pushed in a prod environnement. How do you prevent the dev to gain access to sensible information which are store in env variable in prod environnement ? Do we restrit the access to this server ? — curious, Sep 15 '20 at 13:03
@curious See my answer section on Env file for 2. You really should have a separate env file on your prod environment not committed to the repository (via puppet or other means). Furthermore, permissions should be user based and therefore different from what users have on their local machines. — pygeek, Sep 15 '20 at 13:12
I don't think his code is being distributed. If @curious is working within a team that is running production code, it's important to lock the Python version down so that whatever is being developed is running the same Python version that will be running on production in order to prevent bugs related to deprecation or compatibility. This is why it's important to use a tool such as Pipenv or Poetry. — pygeek, Sep 15 '20 at 13:19
@curious ideally the dev environment should contain a dev database with dev data which are decoupled from the production data. the 2 environment should share the DB structure (db name, tables, triggers and stored procedures, roles, etc...) but the db accounts, the applications user (if some) and the data should be different. Said simply each dev should have a local database with local data for dev purpose only. When the code is ready, the dev shares the code and the potential changes to DB structure, but the local data remains local and the dev never get the prod data. — Tryph, Sep 15 '20 at 13:24
@pygeek the title explicitely states that the OP wants to distribute its code. There are other ways than lock to avoid problems when deploying on production env. running tests at every push and containerizing is one of them. I personally never lock an env, I just let the CI warn me when a dependancy causes problem and react properly (by locking this particular dependance until the problem is fixed or anything else). It also prevents situation where migrating to a new version is painfull since it has never been done for years. — Tryph, Sep 16 '20 at 13:04