I want to read a csv file into a list in an apache beam application, where each element in the list is a tuple or list (don't really matter), so that I would have the csv
1,2,3
4,5,6
become
[(1,2,3) , (4,5,6)]
or
[ [1,2,3], [4,5,6] ]
I tried following the instructions in How to convert csv into a dictionary in apache beam dataflow but when I try to use
from beam_utils.sources import CsvFileSource
I get
from beam_utils.sources import CsvFileSource
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/beam_utils/sources.py", line 9, in <module>
from apache_beam.io import fileio
ImportError: cannot import name fileio
If I try to directly import
from apache_beam.io import fileio
I get the same issue, however I can use both of
import apache_beam.io
import beam_utils
without any issues. Anyone got a good idea of what the issue might be or got a good idea of how I could do this in a different way?
I currently have
with beam.Pipeline(options = pipeline_options) as p:
csvfile = p | ReadFromText(known_args.input)
so if I can turn csvfile
to the desired format in another way that works well too