0

Morning, I came across with a project that I need to create a pipeline that reads a csv file in the internet. I wonder if Apache beam has a method to do that ? Something like this :

beam.io.ReadFromText("https://www.samples.com/file.csv")

My current pipeline is :

    data =DownloadFileCSV()    #LOAD FILE INTO MEMORY
    fileCSV = data 

    #CRIA O PIPELINE
    with beam.Pipeline(options=pipeline_options) as p:
        
        #CARREGA O ARQUIVO CSV DO SITE
        elements = p | 'Load File CSV' >> beam.Create(fileCSV) 

        #SALVA O ARQUIVO NO BUCKET DO GCP
        elements | 'Save new file to GCS' >>  beam.ParDo(WriteBatchesToGCS(nmBucket)) 

I would like to replace the line :

elements = p | 'Load File CSV' >> beam.Create(fileCSV) 

I don't want to keep a part of the process out of the pipeline. So, Can Apache beam do it ?

Thank you, Juliano

0 Answers0