I have column of dtype objects which look ostensibly like lists:
import pandas as pd
import numpy as np
raw = '/******/*******/******/data.txt'
df = pd.read_csv(raw, sep='\t')
df.head()
id val_0 val_1 val_2 feat_0 feat_1 feat_2 \
0 a 2 0 2 2 2 0
1 b 1 -1 1 1 1 -2
2 c 0 -2 -2 0 2 1
3 d -1 1 -1 -1 1 -2
4 e -2 2 0 -2 0 2
objs_0 objs_1 \
0 [u'word_0', u'word_1', u'word_2'] [u'word_0', u'word_1', u'word_2']
1 [u'word_0', u'word_1', u'word_2'] [u'word_0', u'word_1', u'word_2']
2 [u'word_0', u'word_1', u'word_2'] [u'word_0', u'word_1', u'word_2']
3 [u'word_0', u'word_1', u'word_2'] [u'word_0', u'word_1', u'word_2']
4 [u'word_0', u'word_1', u'word_2'] [u'word_0', u'word_1', u'word_2']
objs_2
0 [u'word_0', u'word_1', u'word_2']
1 [u'word_0', u'word_1', u'word_2']
2 [u'word_0', u'word_1', u'word_2']
3 [u'word_0', u'word_1', u'word_2']
4 [u'word_0', u'word_1', u'word_2']
df['objs_0'].values
array(["[u'word_0', u'word_1', u'word_2']",
"[u'word_0', u'word_1', u'word_2']",
"[u'word_0', u'word_1', u'word_2']",
"[u'word_0', u'word_1', u'word_2']",
"[u'word_0', u'word_1', u'word_2']"], dtype=object)
Ultimately, I need to convert this df to "long" format, and I want to run to run this using the code here: pandas: When cell contents are lists, create a row for each element in the list
But the problem is that I cannot convert these strings to lists.
I have already tried:
df['objs_0'] = df['objs_0'].apply(lambda row: list(row))
df['objs_0']
But this just breaks the entire string up by character. Also, my "string lists" are of unpredictable length, so I cannot rely on the str.partition() method. Any help on this would be greatly appreciated!