I'm experimenting with PyBuilder because I'm looking for a more organised and production-oriented way of developing data science projects.
So far, I've created a PyBuilder project with the following structure (folder are uppercased for readability):
PROJECT
| build.py
| setup.py
+-- .ENV
| +-- ...
+-- SRC
| +-- MAIN
| | +-- FIXTURES
| | | +-- data.csv
| | +-- PYTHON
| | | +-- code.py
| | +-- SCRIPTS
| | +-- run.py
| +---TEST
| +-- FIXTURES
| | +-- values.csv
| +-- PYTHON
| +-- test_code.py
...
build.py
and setup.py
are PyBuilder generated files.
.env
contains the virtual environment (i.e.: Python 3.7).
src\main
and src\test
have the usual structure apart from the fact that each contain a new fixtures
folder (much like resources
in Java). If you wonder, src\test
looks as stated because:
project.set_property("dir_source_unittest_python", "src/test/python")
project.set_property("unittest_module_glob", "test_*")
My intent is as follows:
run.py
contains a script that calls the code incode.py
to predict tomorrow's weather, for instancecode.py
contains the code to load the dataset indata.csv
and build a model that provides weather's predictions for a given day.data.py
contains the historical data that is needed incode.py
to train the weather forecasting modeltest_code.py
contains the unit tests to make sure that the model and utility functions incode.py
work as expectedvalues.py
contains the input values and expected results to be used intest_code.py
to testcode.py
.
My code in code.py
accesses data.csv
by defining the FIXTURES folder as follows:
FIXTURES = os.path.join(os.path.dirname(__file__), '..', 'fixtures')
...
with open(os.path.join(FIXTURES, 'data.csv'), 'r') as file:
...
And I can successfully run the script run.py
from within my IDE to generate predictions.
When I try to generate a package to share the predictor with my colleagues, I see that the src\main\fixtures
folder is not copied over. After some research (see this question), I managed to amend this by:
Moving the
fixtures
folder intopython
Adding
project.include_file("lib/python3.7/site-packages/fixtures", "fixtures/*.csv")
tobuild.py
.
Unfortunately, I would like to keep fixtures
where it was initially. I've noticed anyway that run.py
fails to execute even though the installation terminates successfully (pyb install
). The reason is that data.csv
can't be successfully located:
...
FileNotFoundError: [Errno 2] File b'/Users/stefano/Workspace/project/.env/lib/python3.7/site-packages/../fixtures/data.csv' does not exist: b'/Users/stefano/Workspace/project/.env/lib/python3.7/site-packages/../fixtures/data.csv'
Does anyone know how to keep the fixtures
folder in src\main
(rather than in src\main\python
)?
Also, does anyone know how to make files like data.csv
discoverable after package installation?
Thanks in advance for any help!
Note: Please be aware that a solution using this structure might not be the most convenient one if the data.csv
is quite big.