3

I am trying to run a very simple program in Apache Beam to try out how it works.

import apache_beam as beam


class Split(beam.DoFn):
    def process(self, element):
        return element


with beam.Pipeline() as p:
    rows = (p | beam.io.ReadAllFromText(
        "input.csv") | beam.ParDo(Split()))

While running this, I get the following errors

.... some more stack....
 File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/transforms/util.py", line 565, in expand
    windowing_saved = pcoll.windowing
  File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/pvalue.py", line 137, in windowing
    self.producer.inputs)
  File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 464, in get_windowing
    return inputs[0].windowing
  File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/pvalue.py", line 137, in windowing
    self.producer.inputs)
  File "/home/raheel/code/beam-practice/lib/python2.7/site-packages/apache_beam/transforms/ptransform.py", line 464, in get_windowing
    return inputs[0].windowing
AttributeError: 'PBegin' object has no attribute 'windowing'

Any Idea what is wrong here ?

Thanks

Pablo
  • 10,425
  • 1
  • 44
  • 67
Raheel
  • 8,716
  • 9
  • 60
  • 102

1 Answers1

5

ReadAllFromText expects to read from a PCollection of files instead of passing it as an argument. So, in your case, it should be:

p | beam.Create(["input.csv"])
  | beam.io.ReadAllFromText()
Guillem Xercavins
  • 6,938
  • 1
  • 16
  • 35