I am trying to process a CSV file into a dict using a Dataflow template and Python.
As it is a template I have to use ReadFromText from the textio module, to be able to provide the path at runtime.
| beam.io.ReadFromText(contact_options.path)
All I need is to be able to extract the first line of this text/csv file, I can then use this data in DictReader as the fieldnames.
If I use split lines it brings back a each element of the text file in a list:
return element.splitlines()
or
csv_data = []
split_element = element.split('\n')
for row in split_element:
csv_data.append(row)
return csv_data
['phone_number', 'cid', 'first_name', 'last_name']
[' ', '101XXXXX', 'MurXXX', 'LevXXXX']
['3052XXXXX', '109XXXXX', 'MerXXXX', 'CoXXXX']
['954XXXXX', '10XXXXXX', 'RoXXXX', 'MaXXXXX']
Although If I then use say element[0], it just brings everythin back without the list brackets. I have also tried splitting by '\n', then using a for loop to produce a list object, although it produces almost the same result.
I cannot rely on using predetermined fieldnames as the csv files to be processed will all have different fieldnames and DictReader will not work effectively without fieldnames given.
EDIT:
The expected output is:
[{'phone_Number': '561XXXXX', 'first_Name': '', 'last_Name': 'BeXXXX', 'cid': '745XXXXX'}, {'phone_Number': '561XXXXX', 'first_Name': 'A', 'last_Name': 'BXXXX', 'cid': '61XXXXX'}]
EDIT:
Element contents:
"phone_Number","cid","first_Name","last_Name"
"5616XXXXX","745XXXX","","BeXXXXX"
"561XXXXXX","61XXXXX","A","BXXXXXXt"
"95XXXXXXX","6XXXXXX","A","BXXXXXX"
"727XXXXXX","98XXXXXX","A","CaXXXXXX"