13

I'm setting up a dataflow job and for this job the workers need access to a private bitbucket repository to install a library to process the data. In order to grant access to the dataflow workers, I have set up a pair of SSH keys (public & private). I managed to get the private key onto my dataflow worker. When trying to pip install the package via git+ssh I'm getting an error Host key verification failed.

I have tried to look for the .ssh/known_hosts file on the dataflow worker but this is not as straight forward then on a regular VM.

Alternatively, I have set it up myself via the following commands but this did not work as well:

mkdir -p ~/.ssh
chmod 0700 ~/.ssh
ssh-keyscan bitbucket.org > ~/.ssh/known_hosts

I still get the Host key verification failed error.

An alternative suggested fix for this problem is to run ssh-keygen -R bitbucket.org but then I get following error: mkstemp: No such file or directory

For Dataflow Python SDK, you need to package your code with a setup.py. All the commands to be executed upon worker start-up are written with subprocess.Popen. The list of commands is as follows:

CUSTOM_COMMANDS = [
    # decrypt key encrypted key in repository via gcloud kms
    ['gcloud', '-v'],
    ['gcloud', 'kms', 'decrypt', '--location', 'global', '--keyring',
     'bitbucketpackages', '--key', 'package', '--plaintext-file',
     'bb_package_key_decrypted', '--ciphertext-file', 'bb_package_key'],
    ['chmod', '700', 'bb_package_key_decrypted'],
    # install git & ssh
    ['apt-get', 'update'],
    ['apt-get', 'install', '-y', 'openssh-server'],
    ['apt-get', 'install', '-y', 'git'],
    # add bitbucket.org as known host
    ['mkdir', '-p', '~/.ssh'],
    ['chmod', '0700', '~/.ssh'],
    ['ssh-keyscan', 'bitbucket.org', '>', '~/.ssh/known_hosts'],
    # other attempts to fix it
    # ['ssh-keygen', '-R', 'bitbucket.org']
    # pip install
    ['sh', '-c', 'GIT_SSH_COMMAND="ssh -i ./bb_package_key_decrypted" pip install git+ssh://git@bitbucket.org/team/repo.git'],
] 
Sven.DG
  • 295
  • 1
  • 13
  • For the first solution that you tried, did you do this within the container that is running your code or only within the VM? – Lukasz Cwik May 24 '19 at 18:40
  • As a temporary workaround does passing the `-o "StrictHostKeyChecking=no"` option within GIT_SSH_COMMAND make the pip install work? – Lukasz Cwik May 24 '19 at 18:53

1 Answers1

5

Try updating ssh-keyscan to write to some temp path and then passing the known hosts file location as a part of the GIT_SSH_COMMAND. For example, I would update your script to be:

CUSTOM_COMMANDS = [
    # decrypt key encrypted key in repository via gcloud kms
    ['gcloud', '-v'],
    ['gcloud', 'kms', 'decrypt', '--location', 'global', '--keyring',
     'bitbucketpackages', '--key', 'package', '--plaintext-file',
     'bb_package_key_decrypted', '--ciphertext-file', 'bb_package_key'],
    ['chmod', '700', 'bb_package_key_decrypted'],
    # install git & ssh
    ['apt-get', 'update'],
    ['apt-get', 'install', '-y', 'openssh-server'],
    ['apt-get', 'install', '-y', 'git'],
    # add bitbucket.org as known host
    ['mkdir', '-p', '~/.ssh'],
    ['chmod', '0700', '~/.ssh'],
    ['ssh-keyscan', 'bitbucket.org', '>', '/tmp/bit_bucket_known_hosts'],
    # other attempts to fix it
    # ['ssh-keygen', '-R', 'bitbucket.org']
    # pip install
    ['sh', '-c', 'GIT_SSH_COMMAND="ssh -o UserKnownHostsFile=/tmp/bit_bucket_known_hosts -i ./bb_package_key_decrypted" pip install git+ssh://git@bitbucket.org/team/repo.git'],
] 
Lukasz Cwik
  • 1,641
  • 12
  • 14
  • This solution indeed works, thanks! I also found that by chaining the commands it's also possible to write to the root directory, but if you separate the commands (as in the solution above where you write to a temp dir) it does not work. The command that also worked: ```['sh', '-c', 'mkdir -p /root/.ssh && chmod 0700 /root/.ssh && ssh-keyscan bitbucket.org > /root/.ssh/known_hosts && GIT_SSH_COMMAND="ssh -i ./bb_package_key_decrypted" pip install git+ssh://git@bitbucket.org/team/repo.git']```. It looks like the behavior in this root dir is similar like constructing a Dockerfile. – Sven.DG May 25 '19 at 07:32