There is a folder name "data-persistent" in the running container that the code reads and writes from, I want to save the changes made in that folder. when I use persistent volume, it removes/hides the data from that folder and the code gives an error. So what should be my approach.
FROM python:latest
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
#RUN mkdir data-persistent
ADD linkedin_scrape.py .
COPY requirements.txt ./requirements.txt
COPY final_links.csv ./final_links.csv
COPY credentials.txt ./credentials.txt
COPY vectorizer.pk ./vectorizer.pk
COPY model_IvE ./model_IvE
COPY model_JvP ./model_JvP
COPY model_NvS ./model_NvS
COPY model_TvF ./model_TvF
COPY nocopy.xlsx ./nocopy.xlsx
COPY data.db /data-persistent/
COPY textdata.txt /data-persistent/
RUN ls -la /data-persistent/*
RUN pip install -r requirements.txt
CMD python linkedin_scrape.py --bind 0.0.0.0:8080 --timeout 90
And my deployment file
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-first-cluster1
spec:
replicas: 2
selector:
matchLabels:
app: scrape
template:
metadata:
labels:
app: scrape
spec:
containers:
- name: scraper
image: image-name
#
ports:
- containerPort: 8080
env:
- name: PORT
value: "8080"
volumeMounts:
- mountPath: "/dev/shm"
name: dshm
- mountPath: "/data-persistent/"
name: tester
volumes:
- name: dshm
emptyDir:
medium: Memory
- name: tester
persistentVolumeClaim:
claimName: my-pvc-claim-1
Let me explain the workflow of the code. The code reads from the textdata.txt file which contains the indices of links to be scraped e.g. from 100 to 150, then it scrapes the profiles, inserts them to data.db file and then writes to the texdata.txt file the sequence to be scraped in next run e.g. 150 to 200.