Morning, I came across with a project that I need to create a pipeline that reads a csv file in the internet. I wonder if Apache beam has a method to do that ? Something like this :
beam.io.ReadFromText("https://www.samples.com/file.csv")
My current pipeline is :
data =DownloadFileCSV() #LOAD FILE INTO MEMORY
fileCSV = data
#CRIA O PIPELINE
with beam.Pipeline(options=pipeline_options) as p:
#CARREGA O ARQUIVO CSV DO SITE
elements = p | 'Load File CSV' >> beam.Create(fileCSV)
#SALVA O ARQUIVO NO BUCKET DO GCP
elements | 'Save new file to GCS' >> beam.ParDo(WriteBatchesToGCS(nmBucket))
I would like to replace the line :
elements = p | 'Load File CSV' >> beam.Create(fileCSV)
I don't want to keep a part of the process out of the pipeline. So, Can Apache beam do it ?
Thank you, Juliano