I'm setting up a dataflow job and for this job the workers need access to a private bitbucket repository to install a library to process the data. In order to grant access to the dataflow workers, I have set up a pair of SSH keys (public & private). I managed to get the private key onto my dataflow worker. When trying to pip install the package via git+ssh I'm getting an error Host key verification failed
.
I have tried to look for the .ssh/known_hosts
file on the dataflow worker but this is not as straight forward then on a regular VM.
Alternatively, I have set it up myself via the following commands but this did not work as well:
mkdir -p ~/.ssh
chmod 0700 ~/.ssh
ssh-keyscan bitbucket.org > ~/.ssh/known_hosts
I still get the Host key verification failed
error.
An alternative suggested fix for this problem is to run ssh-keygen -R bitbucket.org
but then I get following error:
mkstemp: No such file or directory
For Dataflow Python SDK, you need to package your code with a setup.py
. All the commands to be executed upon worker start-up are written with subprocess.Popen
. The list of commands is as follows:
CUSTOM_COMMANDS = [
# decrypt key encrypted key in repository via gcloud kms
['gcloud', '-v'],
['gcloud', 'kms', 'decrypt', '--location', 'global', '--keyring',
'bitbucketpackages', '--key', 'package', '--plaintext-file',
'bb_package_key_decrypted', '--ciphertext-file', 'bb_package_key'],
['chmod', '700', 'bb_package_key_decrypted'],
# install git & ssh
['apt-get', 'update'],
['apt-get', 'install', '-y', 'openssh-server'],
['apt-get', 'install', '-y', 'git'],
# add bitbucket.org as known host
['mkdir', '-p', '~/.ssh'],
['chmod', '0700', '~/.ssh'],
['ssh-keyscan', 'bitbucket.org', '>', '~/.ssh/known_hosts'],
# other attempts to fix it
# ['ssh-keygen', '-R', 'bitbucket.org']
# pip install
['sh', '-c', 'GIT_SSH_COMMAND="ssh -i ./bb_package_key_decrypted" pip install git+ssh://git@bitbucket.org/team/repo.git'],
]