Python ast.literal_eval returns Malformed String Error

Question

I'm currently trying to convert a string representation of a list of lists to a list of lists using the ast.literal_eval method. I've tried looking at the following questions on this community:

Malformed String ValueError ast.literal_eval() with String representation of Tuple
python ast.literal_eval throwing malformed string error given “datetime.datetime.now()”

but the solutions and answers offered don't seem to be applying to my situation.

I currently have a Pandas DataFrame of the form (example):

industry     index     entities
cars         0         [ ['car1', 'it'], ['them', 'car2', 'car3'] ]
cars         1         [ ['car4', 'its'], ['car5', 'car6'] ]

When I load in the CSV file using pandas.read_csv, the entries in column entities are string representations of lists. I attempted to use ast.literal_eval to convert them into lists but the following happens:

df['entities'] = ast.literal_eval(df['entities'])


ValueError: malformed node or string: 0      [['car1', 'it'], ['them', 'car2', 'car3']]
1      [['car4', 'its'], ['car5', 'car6']]

I'm aware that the arguments used in ast.literal_eval must be Python literal structures, but nothing in the arguments I'm passing don't seem to not be Python literals, so that doesn't seem to be the problem.

To provide some additional background information, I used this same method to perform an identical operation before and it worked fine. However, I recently modified the original DataFrame to remove instances of the word "the."

What might be causing this error? Any tips would be appreciated. Thank you.

Edit

df.head(2).to_dict() returns the following. Note that this is different from the example I provided because this is the original DataFrame that I'm working with:

{'industry': {0: 'automotiveEngineering', 1: 'automotiveEngineering'},
 'index': {0: 0, 1: 1},
 'entities': {0: "[['Norway', 'it'], ['EQC—and', 'it', 'EQC', 'EQC'], ['Mercedes-Benz EQC Edition 1886 electric SUV', 'it', 'it', 'EQC400 4Matic crossover']]",
  1: '[[\'Ford Fusion\', \'Fusion\', \'Fusion\', \'Fusion\'], ["2013–2016 Ford Fusion sedans.automaker \'s", \'automaker\'], [\'Ford\', \'Ford\'], [\'faulty shifter cables that can cause rollaways\', \'these shifter cables , which can break off transmission due to a bad bushing at connection point\'], [\'these bushings\', \'them\']]'}}

I've also tried looping through each row and modifying each entity separately, but it still gives me the same error.

I'd also like to add that when I run ast.literal_eval on a single row, it returns the appropriate value without any problem.

Edit 2

I managed to achieve what I was trying to do by running:

df['column'] = df['column'].apply(ast.literal_eval)

but unfortunately that doesn't answer my initial question of what may be causing the malformed string/node error.

Well, I'm not sure about you can apply the `ast.literal_eval` to the whole column, why are you not better to create a loop and apply the function to each row and then recover it as a list. Then, you can assign the list as the value of the `entities` column. — Kenry Sanchez, Jul 16 '19 at 04:43
Thanks for the extra pointers guys, I'll edit in the details. — Sean, Jul 16 '19 at 04:52

Python ast.literal_eval returns Malformed String Error

0 Answers0