I am using CSVRecordSource to read the CSV in Apache Beam pipeline that uses open_file in read_records function.
With python 2 everything worked fine, but when I migrated to python 3 it complains about below
next(csv_reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
By default open_file method opens the file in binary mode.
So I changed it to use
with open(filename, "rt") as f:
but it fails when I run the dataflow in Google cloud as it is not able to find the file and gives error
FileNotFoundError: [Errno 2] No such file or directory
Below is my code
with self.open_file(filename) as f:
csv_reader = csv.reader(f, delimiter=self.delimiter, quotechar=self.quote_character)
header = next(csv_reader)
How can I use CSVRecordSource with python 3?