2

I'm using Cloudera Hive ODBC driver in my code and I'm trying to containerize the app. Below is my Dockerfile,

FROM ubuntu:18.04
FROM continuumio/anaconda3
FROM node:10
 

RUN conda update -n base -c defaults conda

RUN conda create -n env python=3.7
RUN echo "conda activate env" > ~/.bashrc
ENV PATH /opt/conda/envs/env/bin:$PATH
RUN apt-get update && apt-get install -y \
      curl apt-utils apt-transport-https debconf-utils gcc build-essential \
      && rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y \
      python-pip python-dev python-setuptools \
      --no-install-recommends \
      && rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip
RUN pip install pyyaml pandas numpy pymysql sqlalchemy schedule tornado
RUN apt-get update && apt-get install -y --no-install-recommends git unzip unixodbc unixodbc-dev
RUN conda install -c conda-forge turbodbc=3.1.1
RUN apt-get update && apt-get install -y gettext nano vim -y
RUN yarn install --modules-folder ./static
WORKDIR /app
COPY entry.sh /usr/local/bin/
COPY . /app/
ENV SSH_PASSWD "root:Docker!"
RUN apt-get update \
      && apt-get install -y --no-install-recommends dialog \
      && apt-get update \
      && apt-get install -y --no-install-recommends openssh-server \
      && echo "$SSH_PASSWD" | chpasswd 
COPY sshd_config /etc/ssh/
COPY entry.sh /usr/local/bin/
RUN chmod u+x /usr/local/bin/entry.sh
EXPOSE 5000 2222 22 80 8000
CMD ["entry.sh"]

Building Image is getting successful, but I see when I run the docker image, I see below error

Traceback (most recent call last):


File "app.py", line 14, in <module>
    from abc_scheduler import scheduler_main
  File "/app/abc_scheduler.py", line 5, in <module>
    from methods import Methods
  File "/app/methods.py", line 10, in <module>
    from utils import *
  File "/app/utils.py", line 2, in <module>
    from turbodbc import connect, make_options
ModuleNotFoundError: No module named 'turbodbc'

I have tried many other ODBC's inside my Dockerfile, but no luck. Any help would be great.

Ajay A
  • 1,030
  • 1
  • 7
  • 19

2 Answers2

0

As suggested by @DavidMaze, I managed create a successful Dockerfile & is shown below

FROM ubuntu:latest
FROM continuumio/anaconda3
FROM node:10

RUN conda update -n base -c defaults conda

RUN conda create -n env python=3.7
RUN echo 'conda init bash' >/.bashrc
RUN echo "conda activate env" > ~/.bashrc

ENV PATH /opt/conda/envs/env/bin:$PATH
RUN apt-get update && apt-get install -y \
      curl apt-utils apt-transport-https debconf-utils gcc build-essential \
      && rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y \
      python-pip python-dev python-setuptools \
      --no-install-recommends \
      && rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip
# ==================TURBODBC========================
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get dist-upgrade -y

RUN apt-get install -y alien # optional
COPY ClouderaHiveODBC-2.6.1.1001-1.x86_64.rpm /opt/cloudera/
RUN alien /opt/cloudera/ClouderaHiveODBC-2.6.1.1001-1.x86_64.rpm
RUN dpkg -i clouderahiveodbc_2.6.1.1001-2_amd64.deb
# ==================END=============================
RUN conda install --name env -c conda-forge turbodbc=4.1.1 tornado=6.0.4 pyyaml pymysql schedule sqlalchemy pyarrow numpy=1.19.3\
    pandas=1.1.4  pybind11 pyarrow
COPY odbc.ini /etc/
RUN apt-get update && apt-get install -y gettext nano vim -y
RUN yarn install --modules-folder ./static
WORKDIR /app
COPY . /app/
ENV SSH_PASSWD "root:Docker!"
RUN apt-get update \
      && apt-get install -y --no-install-recommends dialog \
      && apt-get update \
      && apt-get install -y --no-install-recommends openssh-server \
      && echo "$SSH_PASSWD" | chpasswd 
COPY sshd_config /etc/ssh/
COPY entry.sh /usr/local/bin/
RUN chmod u+x /usr/local/bin/entry.sh
EXPOSE 9988 2222 22 80 8000
CMD ["entry.sh"]

Keep a copy of ClouderaHiveODBC-2.6.1.1001-1.x86_64.rpm in the current directory Keep the below files as well :

odbc.ini - which has the DB info

entry.sh - which is shell script and has a command - python app.py

ssh_config - file without any extension has the information as shown below

Port                    2222
ListenAddress           0.0.0.0
LoginGraceTime          180
X11Forwarding           yes
Ciphers                 aes128-cbc,3des-cbc,aes256-cbc
MACs                    hmac-sha1,hmac-sha1-96
StrictModes             yes
SyslogFacility          DAEMON
PrintMotd               no
IgnoreRhosts            no
#deprecated option
#RhostsAuthentication   no
RhostsRSAAuthentication yes
RSAAuthentication       no
PasswordAuthentication  yes
PermitEmptyPasswords    no
PermitRootLogin         yes
Ajay A
  • 1,030
  • 1
  • 7
  • 19
0

I want to expand the answer by showing an approach that works without conda being necessary. In other words, a full-pip minimum viable docker setup for installing turbodbc. I've fully documented the solution in this Github comment in the official turbodbc repo.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 02 '22 at 07:03