I have a docker image felipeogutierrez/tpch-dbgen
that I build using docker-compose
and I push it to docker-hub registry using travis-CI
.
version: "3.7"
services:
other-images: ....
tpch-dbgen:
build: ../docker/tpch-dbgen
image: felipeogutierrez/tpch-dbgen
volumes:
- tpch-dbgen-data:/opt/tpch-dbgen/data/
- datarate:/tmp/
stdin_open: true
and this is the Dockerfile
to build this image:
FROM gcc AS builder
RUN mkdir -p /opt
COPY ./generate-tpch-dbgen.sh /opt/generate-tpch-dbgen.sh
WORKDIR /opt
RUN chmod +x generate-tpch-dbgen.sh && ./generate-tpch-dbgen.sh
In the end, this scripts creates a directory /opt/tpch-dbgen/data/
with some files that I would like to read from another docker image that I am running on Kubernetes. Then I have a Flink image that I create to run into Kubernetes. This image starts 3 Flink Task Managers and one stream application that reads files from the image tpch-dbgen-data
. I think that the right approach is to create a PersistentVolumeClaim
so I can share the directory /opt/tpch-dbgen/data/
from image felipeogutierrez/tpch-dbgen
to my flink image in Kubernetes. So, first I have this file to create the PersistentVolumeClaim
:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: tpch-dbgen-data-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Mi
Then, I am creating an initContainers
to launch the image felipeogutierrez/tpch-dbgen
and after that launch my image felipeogutierrez/explore-flink:1.11.1-scala_2.12
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: flink-taskmanager
spec:
replicas: 3
selector:
matchLabels:
app: flink
component: taskmanager
template:
metadata:
labels:
app: flink
component: taskmanager
spec:
initContainers:
- name: tpch-dbgen
image: felipeogutierrez/tpch-dbgen
#imagePullPolicy: Always
env:
command: ["ls"]
# command: ['sh', '-c', 'for i in 1 2 3; do echo "job-1 `date`" && sleep 5s; done;', 'ls']
volumeMounts:
- name: tpch-dbgen-data
mountPath: /opt/tpch-dbgen/data
containers:
- name: taskmanager
image: felipeogutierrez/explore-flink:1.11.1-scala_2.12
#imagePullPolicy: Always
env:
args: ["taskmanager"]
ports:
- containerPort: 6122
name: rpc
- containerPort: 6125
name: query-state
livenessProbe:
tcpSocket:
port: 6122
initialDelaySeconds: 30
periodSeconds: 60
volumeMounts:
- name: flink-config-volume
mountPath: /opt/flink/conf/
- name: tpch-dbgen-data
mountPath: /opt/tpch-dbgen/data
securityContext:
runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary
volumes:
- name: flink-config-volume
configMap:
name: flink-config
items:
- key: flink-conf.yaml
path: flink-conf.yaml
- key: log4j-console.properties
path: log4j-console.properties
- name: tpch-dbgen-data
persistentVolumeClaim:
claimName: tpch-dbgen-data-pvc
The Flink stream application is starting but it cannot read the files on the directory /opt/tpch-dbgen/data
of the image felipeogutierrez/tpch-dbgen
. I am getting the error: java.io.FileNotFoundException: /opt/tpch-dbgen/data/orders.tbl (No such file or directory)
. It is strange because when I try to go into the container felipeogutierrez/tpch-dbgen
I can list the files. So I suppose there is something wrong on my Kubernetes configuration. Does anyone know to point what I am missing on the Kubernetes configuration files?
$ docker run -i -t felipeogutierrez/tpch-dbgen /bin/bash
root@10c0944a95f8:/opt# pwd
/opt
root@10c0944a95f8:/opt# ls tpch-dbgen/data/
customer.tbl dbgen dists.dss lineitem.tbl nation.tbl orders.tbl part.tbl partsupp.tbl region.tbl supplier.tbl
Also, when I list the logs of the container tpch-dbgen
I can see the directory tpch-dbgen
that I want to read. Although I cannot execute the command command: ["ls tpch-dbgen"]
inside my Kubernetes config file.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
flink-jobmanager-n9nws 1/1 Running 2 17m
flink-taskmanager-777cb5bf77-ncdl4 1/1 Running 0 4m54s
flink-taskmanager-777cb5bf77-npmrx 1/1 Running 0 4m54s
flink-taskmanager-777cb5bf77-zc2nw 1/1 Running 0 4m54s
$ kubectl logs flink-taskmanager-777cb5bf77-ncdl4 tpch-dbgen
generate-tpch-dbgen.sh
tpch-dbgen