With git I manage the changes of a python script (script.py
) and a set of tests, this test use some text input data files, with this directory structure
script.py
tests/
test_01.py
test_02.py
data/
data_file01
data_file02
...
but, some input data files start to be very large ( > 1MB).
with git, Which is a good practive to manage input data for test ?
... maybe allow in a online storage, but, how preserve and check the changes over the input data files ? (suggestions?)
... or maybe use a library like setuptools
to check if don't exist the input data test and download this, but, how preserve and check the changes over the input data files ?
EDIT
now I backup the data test in a compressed file with the correspond commit name in a cloud disk (dropbox, google drive, etc), with a line in the post-commit hook
commit_name=$(git rev-parse HEAD)
fecha=$(date +%Y%m%d)
7z a $CLOUD_DISK"/data_test/$fecha"_"$commit_name".7z data/* -r
(I prefer 7z over zip because I get a compressed file of less size)
$CLOUD_DISK
variable is defined in the .bashrc
.
EDIT 2
I started to work in a more complete way to solve my problem.