I've developed an apache beam pipeline locally where I run predictions on a sample file.
Locally on my computer I can load the model like this:
with open('gs://newbucket322/my_dumped_classifier.pkl', 'rb') as fid:
gnb_loaded = cPickle.load(fid)
but when running on google dataflow that doesn't obviously work. I tried changing the path to GS:// but that also obviously does not work.
I also tried this code snippet (from here) that was used to load files:
class ReadGcsBlobs(beam.DoFn):
def process(self, element, *args, **kwargs):
from apache_beam.io.gcp import gcsio
gcs = gcsio.GcsIO()
yield (element, gcs.open(element).read())
model = (p
| "Initialize" >> beam.Create(["gs://bucket/file.pkl"])
| "Read blobs" >> beam.ParDo(ReadGcsBlobs())
)
but that doesn't work when wanting to load my model, or atleast I cannot use this model variable to call the predict method.
Should be a pretty straightforward task but I can't seem to find a straightforward answer to this.