0

I'm using Gitpod to generate a container via Docker (because I'm using Gitpod, I don't have access to Docker command line). My goal is to install Python and PostgreSQL. This is my current Dockerfile:

# Base image is one of Python official distributions.
FROM python:3.8.13-slim-buster

# Update libraries and install sudo.
RUN apt update
RUN apt -y install sudo

# Install curl.
RUN sudo apt install -y curl

# Install git.
RUN sudo apt install install-info
RUN sudo apt install -y git-all

# Install nodejs.
RUN curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
RUN sudo apt install -y nodejs

# Download Google Cloud CLI installation script.
RUN mkdir -p /tmp/google-cloud-download
RUN curl -sSL https://sdk.cloud.google.com > /tmp/google-cloud-download/install.sh

# Install Google Cloud CLI.
RUN mkdir -p /gcloud
RUN bash /tmp/google-cloud-download/install.sh --install-dir=/gcloud --disable-prompts

# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./

# Install production dependencies.
RUN pip install --no-cache-dir -r requirements.txt

# Set some variables and create gitpod user.
ENV PGWORKSPACE="/workspace/.pgsql"
ENV PGDATA="$PGWORKSPACE/data"
RUN sudo mkdir -p $PGDATA
RUN useradd -l -u 33333 -G sudo -md /home/gitpod -s /bin/bash -p gitpod gitpod
RUN sudo chown gitpod $PGWORKSPACE -R

# Declare Django env variables.
ENV DJANGO_DEBUG=True
ENV DJANGO_DB_ENGINE=django.db.backends.postgresql_psycopg2

# Declare Postgres env variables. Note that these variables
# cannot be renamed since they are used by Postgres.
# https://www.postgresql.org/docs/current/libpq-envars.html
ENV PGDATABASE=postgres
ENV PGUSER=gitpod
ENV PGPASSWORD=gitpod
ENV PGHOST=localhost
ENV PGPORT=5432

# Install PostgreSQL 14. Note that this block needs to be located
# after the env variables are specified, since it uses POSTGRES_DB,
# POSTGRES_USER and POSTGRES_PASSWORD to create the first user.
RUN curl -fsSL https://www.postgresql.org/media/keys/ACCC4CF8.asc|sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/postgresql.gpg
RUN sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
RUN sudo apt -y update
RUN sudo apt -y install postgresql-14

USER gitpod

# Set some more variables and init the db.
ENV PATH="/usr/lib/postgresql/14/bin:$PATH"
RUN mkdir -p ~/.pg_ctl/bin ~/.pg_ctl/sockets
RUN initdb -D $PGDATA
RUN printf '#!/bin/bash\npg_ctl -D $PGDATA -l ~/.pg_ctl/log -o "-k ~/.pg_ctl/sockets" start\n' > ~/.pg_ctl/bin/pg_start
RUN printf '#!/bin/bash\npg_ctl -D $PGDATA -l ~/.pg_ctl/log -o "-k ~/.pg_ctl/sockets" stop\n' > ~/.pg_ctl/bin/pg_stop
RUN chmod +x ~/.pg_ctl/bin/*
ENV PATH="$HOME/.pg_ctl/bin:$PATH"
ENV DATABASE_URL="postgresql://gitpod@localhost"
ENV PGHOSTADDR="127.0.0.1"

At this point, I would expect to have a database called postgres with a default user called gitpod, which is also the name of my default user in bash. However, when I try to use psql -h localhost I receive an error saying:

psql: error: connection to server at "127.0.0.1", port 5432 failed: FATAL:  password authentication failed for user "gitpod"
connection to server at "127.0.0.1", port 5432 failed: FATAL:  password authentication failed for user "gitpod"

Despite the message, I believe no user was actually created. I receive the same message if I try to login with a random string as a username.

I also tried:

echo "host all all 127.0.0.1/32 trust" | sudo tee -a /etc/postgresql/14/main/pg_hba.conf

It doesn't change anything.

echo "host all all localhost trust" | sudo tee -a /etc/postgresql/14/main/pg_hba.conf

It doesn't change anything.

sudo -u gitpod psql postgres

It returns the error: psql: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: FATAL: role "gitpod" does not exist

Edgar Derby
  • 2,543
  • 4
  • 29
  • 48
  • I don't see anything here that would have created a role named gitpod. In dockerized PostgreSQL, POSTGRES_USER usually does that, but here it never gets set. – jjanes Sep 14 '22 at 02:08
  • 1
    The `PGUSER` environment variable causes `initdb` to create the `gitpod` user. – larsks Sep 14 '22 at 02:32
  • Unrelated to your question, but the fact that you can successfully `RUN apt -y install sudo` means that you **don't** need `sudo` at all for any of your other `apt` invocations (or anything else, up until the `USER gitpod` directive). – larsks Sep 14 '22 at 02:33
  • If I build an image from this Dockerfile (`docker build -t pgtest .`) and then start a shell from that image (`docker run -it --rm pgtest bash`) and then start postgres (`~/.pg_ctl/bin/pg_start`), I can run `psql` and it connects to postgres without any errors. – larsks Sep 14 '22 at 02:36
  • @larsks Interesting. So I guess my error has something to do with the way Gitpod dockerize this container...? – Edgar Derby Sep 14 '22 at 05:55
  • @larsks, not PGUSER is for connecting, not creating. It would have to be given explicitly to initdb, like `-U $PGUSER` for creating. – jjanes Sep 14 '22 at 12:20
  • @jjanes I don't know what to tell you; when you run this Dockerfile, the `gitpod` user exists in Postgres. Among the problems that are here, this doesn't appear to be one of them. Oh, it's probably the fact that `initdb` is being run *as* the gitpod user? – larsks Sep 14 '22 at 12:21
  • @larsks ok yes, running initdb as the Linux user 'gitpod' would cause it make that role. So maybe the problem is that he has more than one server running (e.g. from some other container) and is connecting to the wrong one. – jjanes Sep 14 '22 at 12:57
  • @jjanes, you were also right. Apparently I had two problems. One is highlighted by @larsks in his response, the other one is that for some reason connecting via `sudo service postgresql start` connected me to a different server vs. `pg_start`. – Edgar Derby Sep 15 '22 at 05:51

1 Answers1

1

If I create a Gitpod workspace using your Dockerfile to build a custom image (by starting with https://gitpod.io#https://github.com/larsks/so-example-73710360-gitpod-test/tree/main), what I find when I open the terminal is that gitpod mounts your workspace on /workspace, so anything you place there in your Dockerfile (such as the postgres data directory) isn't available at runtime.

If in the terminal I run:

$ mkdir /workspace/data
$ initdb
$ ~/.pg_ctl/bin/pg_start

Then postgres runs correctly, and I can connect to it using psql without problems:

$ psql
psql (14.5 (Debian 14.5-1.pgdg100+1))
Type "help" for help.

postgres=#

We can fix this in the Dockerfile by installing the database outside of /workspace. For example, we can:

  • Use /data/pgdata for the database
  • Use /data/sockets for the sockets directory
  • Use /usr/local/bin for our pg_start/pg_stop scripts

I've made a Dockerfile with these changes (and a few others); you can try it out at https://gitpod.io#https://github.com/larsks/so-example-73710360-gitpod-test/tree/fixed.

With these changes, pg_start runs without a problem, and psql connects successfully:

gitpod@larsks-soexample7371036-htkkrp9bsl7:/workspace/so-example-73710360-gitpod-test$ pg_start
waiting for server to start.... done
server started
gitpod@larsks-soexample7371036-htkkrp9bsl7:/workspace/so-example-73710360-gitpod-test$ psql
psql (14.5 (Debian 14.5-1.pgdg100+1))
Type "help" for help.

postgres=#

I've made a few other changes to the Dockerfile because I couldn't help myself. In particular:

  • I've removed the unnecessary use of sudo throughout the file

  • We can speed up the build process by passing multiple package names to apt install, rather than installing each one individually.

  • I've improved local image build times by re-arranging things to be more cache efficient.

    In particular, when you COPY . ./, any change to a file in your working directory will invalidate the cache, so any build steps after that need to be re-executed. By doing as much as possible before that COPY statement we can significantly decrease the time it takes to rebuild the image locally.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • I've been working on this issue for weeks, I owe you a big one @larsks! Thank you very much for the support, much appreciated. One question, out of curiosity: why does installing packages in one RUN increase performances? – Edgar Derby Sep 15 '22 at 05:48
  • Because this way `apt` only needs to calculate dependencies once. This also has an impact on image size (every `RUN` command generates a new image layer), but I didn't attempt to optimize for size in my changes. – larsks Sep 15 '22 at 12:54
  • Gotcha, thansk! @larsks – Edgar Derby Sep 16 '22 at 02:47