I am running into an issue where in some of my fields, there are new lines within the text. My current code is as follows:
# Python's regular expression library
import re
import sys
# Beam and interactive Beam imports
import apache_beam as beam
from apache_beam.runners.interactive.interactive_runner import InteractiveRunner
import apache_beam.runners.interactive.interactive_beam as ib
p = beam.Pipeline(InteractiveRunner())
def print_row(element):
print(element)
def parse_file(element):
for line in csv.reader([element], quotechar='"', delimiter=',', lineterminator='\n', quoting=csv.QUOTE_ALL, skipinitialspace=True):
return line
parsed_csv = p | 'Read input file' >> beam.io.ReadFromText("gs://ny-data/AB_NYC_2019.csv")| 'Parse file' >> beam.Map(parse_file)
split = parsed_csv | beam.Map(lambda x: x[0]) | beam.Map(print)
p.run()
I am running into issues because some of the text appears as so:
The BLUE OWL:
VEGETARIAN WBURG W PATIO & BACKYARD!
Any thoughts on how to proceed?