I'm trying to write to GCS bucket via Beam (and TF Transform). But I keep getting the following error:
ValueError: Unable to get the Filesystem for path [...]
The answer here and some other sources suggest that I need to pip install aache-beam[gcp]
to get a different variant of Apache Beam that works with GCP.
So, I tried changing the setup.py
of my training package as:
REQUIRED_PACKAGES = ['apache_beam[gcp]==2.14.0', 'tensorflow-ranking', 'tensorflow_transform==0.14.0']
which didn't help. I also tried adding the following to the beginning of my code:
subprocess.check_call('pip uninstall apache-beam'.split())
subprocess.check_call('pip install apache-beam[gcp]'.split())
which didn't work either.
The logs of the failed GCP job is here. The traceback and the error message appear on row 276.
I should mention that running the same code using Beam's DirectRunner and writing the outputs to local disk runs fine. But I'm now trying to switch to DataflowRunner.
Thanks.