I have issues connecting a KafkaIO source to brokers available only through a Cloud VPN tunnel.
The tunnel is set up to allow traffic from a specific subnetwork (secure
) and routes are set up and working for compute engines in that subnetwork.
Executing the pipeline with the DirectRunner
KafkaIO is able to connect to the brokers, whether through the VPN on a standard compute engine in the secure
subnetwork, or through a local machine with ssh tunnels setup by sshuttle
.
Running the pipeline with the DataflowRunner
connections to the brokers fail with:
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
. The pipeline gets executed within the secure
subnetwork.
Connecting to the compute engine instance spanned by the job the following routes are visible:
jgrabber@REDACTED-harness-REDACTED ~ $ ip r
default via 10.74.252.1 dev eth0 proto dhcp src 10.74.252.3 metric 1024
default via 10.74.252.1 dev eth0 proto dhcp metric 1024
10.74.252.1 dev eth0 proto dhcp scope link src 10.74.252.3 metric 1024
10.74.252.1 dev eth0 proto dhcp metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
The IPv4 addresses of the brokers are within a 172.17.0.0/16
(remote) network. The VPN is configured with a remote network range of 172.16.0.0/12
.
Could the remote 172.17.0.0/16
network be shadowed by the virtual network setup and used by docker?