1

I want to implement a Machine Learning algorithm which can operate on homomorphic data using PySEAL library. PySEAL library is released as a docker container with an 'examples.py' file which shows some homomorphic encryption example. I want to edit the 'examples.py' file to implement the ML algorithm. I trying to import a CSV file in this way -

dataset = pd.read_csv ('Dataset.csv')

I have imported pandas library successfully. I have tried many approaches to import the CSV file but failed. How can I import it?

I am new to Docker. Detailed procedure will be really helpful.

  • What error are you getting? Does `Dataset.py` open without issues in pandas when not using docker (in a small test script). Can you provide a small snippet of `Dataset.csv` that other users can use to replicate the error? – user2653663 Jul 11 '19 at 15:24
  • "File b'Dataset.csv' does not exist: b'Dataset.csv'" - this error. –  Jul 11 '19 at 16:14

1 Answers1

1

You can either do it via the Docker build process (assuming you are the one creating the image) or through a volume mapping that would be accessed by the container during runtime.

Building source with Dataset.csv within

For access through the build, you could do a Docker Copy command to get the file within the workspace of the container

FROM 3.7

COPY /Dataset.csv /app/Dataset.csv
...

Then you can directly access the file via /app/Dataset.csv from the container using pandas.read_csv() function, like -

data=pandas.read_csv('/app/Dataset.csv')

Mapping volume share for Dataset.csv

If you don't have direct control over the source image creation, or do not want the dataset packaged with the container (which may be the best practice depending on the use case). You can share it through a volume mapping when starting the container:

dataset = pd.read_csv ('app/Dataset.csv')

Assuming your Dataset.csv is in my/user/dir/Dataset.csv

From CLI:

docker run -v my/user/dir:app my-python-container

The benefit of the latter solution is you can then continue to edit the file 'Dataset.csv' on your host and the file will reflect changes made by you OR the python process should that occur.

pypalms
  • 461
  • 4
  • 12
  • I have tried both of the solutions. First solution still shows file not exist error. And second solution gives following error - "Error response from daemon: invalid volume specification: 'home/user/Codes/Python/PySEAL:app': invalid mount config for type "volume": invalid mount path: 'app' mount path must be absolute." –  Jul 12 '19 at 11:20
  • @AshikurRahman You may need to put a absolute path then, something like `:/app` should be enough, but you could also put it to the absolute path that the python code lives as well. Check this answer for more info on pathing: https://stackoverflow.com/questions/51312276/docker-how-to-pass-a-relative-path-as-an-argument – pypalms Jul 12 '19 at 14:11
  • The first solution worked after putting "COPY /Dataset.csv /app/Dataset.csv" command on Dockerfile and rebuilding it. Previously I didn't used the frontslashes (/) at the beginning of the path. And the path on the read_csv() function was '/app/Dataset.csv' –  Jul 12 '19 at 15:23
  • The relative vs absolute paths are very important in Docker, so I would assume that it could also be something like `COPY /myuser/Documents/project/src/Dataset.csv:/app/Dataset.csv` Glad I could help! – pypalms Jul 12 '19 at 19:48