I'll try to make a dataframe with this data:
test1 test2 test3
test [test1, test2] [testbelongsto1, testbelongst2]
To something like this:
test1 test2 test3
test test1 testbelongsto1
test test2 testbelongsto2
I found this question answer https://stackoverflow.com/a/38652414 Looks exactly what I need right? There are alot questions which answer my question..
However, whatever I try i'm stuck with this error:
TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'
with this function (see link):
def explode(self, df, columns):
idx = np.repeat(df.index, df[columns[0]].str.len())
a = df.T.reindex_axis(columns).values
concat = np.concatenate([np.concatenate(a[i]) for i in range(a.shape[0])])
p = pd.DataFrame(concat.reshape(a.shape[0], -1).T, idx, columns)
return pd.concat([df.drop(columns, axis=1), p], axis=1).reset_index(drop=True)
Important note! the date comes from read_csv function. The columns I need to explode are strings, so I wrote this piece of code to convert them to lists:
df['users'] = df['users'].apply(literal_eval)
Tried everything with converting from dtype to saving them in other formats. But nothing solves the issue...
Please help
UPDATE: A 'real' dataset example of a few rows is displayed below: 'test2' => 'users' and 'test3' => 'interests', the arrays are the same size.
{'index': [0, 1, 2, 3, 4], 'Unnamed: 0': [0, 1, 4, 5, 6], 'users': ['[1, 1, 28, 28, 68]', '[1, 1, 16]', '[32, 37, 66, 67, 54, 117]', '[31, 37, 66, 67, 100, 113, 117]', '[32, 37, 66, 67, 54, 117]'], 'interests': ['[set(), set(), set(), set(), set()]', '[set(), set(), set()]', '[set(), set(), set(), set(), {1535, 1542, 1527}, set()]', '[set(), set(), set(), set(), set(), set(), set()]', '[set(), set(), set(), set(), {1535, 1542, 1527}, set()]']}
UPDATE 2: Ok this is exactly what I try to want. Current data I got now:
`
index lift confidence interests users
0 {333, 333} 1
0 set() 22
0 set() 77
0 0 0.75 set() 88
4 set() 33
4 3 0.50 set() 44
`
So it seems like only the last of each iteration gets added. This is what I want:
`
index lift confidence interests users
0 88 0.33 344, 1
0 88 0.33 333 1
0 88 0.33 set() 22
0 88 0.33 set() 77
0 88 0.33 set() 88
4 38 0.50 set() 33
4 38 0.50 set() 44
`
So what I want is that each data row (serie) is repeated per user and that interests per user are aswell.