Pandas: convert df from "wide" to "long" format with col containing dtype object which should be an array

Question

I have column of dtype objects which look ostensibly like lists:

import pandas as pd
import numpy as np

raw = '/******/*******/******/data.txt'
df = pd.read_csv(raw, sep='\t')
df.head()

id  val_0  val_1  val_2  feat_0  feat_1  feat_2  \
0  a      2      0      2       2       2       0   
1  b      1     -1      1       1       1      -2   
2  c      0     -2     -2       0       2       1   
3  d     -1      1     -1      -1       1      -2   
4  e     -2      2      0      -2       0       2       

                              objs_0                             objs_1  \
0  [u'word_0', u'word_1', u'word_2']  [u'word_0', u'word_1', u'word_2']   
1  [u'word_0', u'word_1', u'word_2']  [u'word_0', u'word_1', u'word_2']   
2  [u'word_0', u'word_1', u'word_2']  [u'word_0', u'word_1', u'word_2']   
3  [u'word_0', u'word_1', u'word_2']  [u'word_0', u'word_1', u'word_2']   
4  [u'word_0', u'word_1', u'word_2']  [u'word_0', u'word_1', u'word_2']       

                              objs_2  
0  [u'word_0', u'word_1', u'word_2']  
1  [u'word_0', u'word_1', u'word_2']  
2  [u'word_0', u'word_1', u'word_2']  
3  [u'word_0', u'word_1', u'word_2']  
4  [u'word_0', u'word_1', u'word_2']  

df['objs_0'].values

array(["[u'word_0', u'word_1', u'word_2']",
   "[u'word_0', u'word_1', u'word_2']",
   "[u'word_0', u'word_1', u'word_2']",
   "[u'word_0', u'word_1', u'word_2']",
   "[u'word_0', u'word_1', u'word_2']"], dtype=object)

Ultimately, I need to convert this df to "long" format, and I want to run to run this using the code here: pandas: When cell contents are lists, create a row for each element in the list

But the problem is that I cannot convert these strings to lists.

I have already tried:

df['objs_0'] = df['objs_0'].apply(lambda row: list(row))
df['objs_0']

But this just breaks the entire string up by character. Also, my "string lists" are of unpredictable length, so I cannot rely on the str.partition() method. Any help on this would be greatly appreciated!

What do you want the output to look like? – ALollz Aug 17 '18 at 18:56 — ALollz, Aug 17 '18 at 18:56

Scott Boston · Answer 1 · 2018-08-17T20:14:06.003

You can use a pandas function, pd.wide_to_long:

df = pd.DataFrame({'id':[0,1],
                  'val_0':[2,1],
                  'val_1':[0,-1],
                  'feat_0':[2,1],
                  'feat_1':[2,1],
                  'objs_0':[['word_0','word_1'],['aword_0','aword_1']],
                  'objs_1':[['word_2','word_3'],['bword_0','bword_1']]})

pd.wide_to_long(df,['val','feat','objs'],'id','varcount','_','\w+').reset_index()

Output:

   id  varcount  val  feat                objs
0   0         0    2     2    [word_0, word_1]
1   1         0    1     1  [aword_0, aword_1]
2   0         1    0     2    [word_2, word_3]
3   1         1   -1     1  [bword_0, bword_1]

And, continuing on break make objs in the long format:

pd.DataFrame(df_out['objs'].values.tolist()).stack().to_frame(name='obj')\
  .reset_index(level=1, drop=True)\
  .join(df_out)

Output:

       obj  id  varcount  val  feat                objs
0   word_0   0         0    2     2    [word_0, word_1]
0   word_1   0         0    2     2    [word_0, word_1]
1  aword_0   1         0    1     1  [aword_0, aword_1]
1  aword_1   1         0    1     1  [aword_0, aword_1]
2   word_2   0         1    0     2    [word_2, word_3]
2   word_3   0         1    0     2    [word_2, word_3]
3  bword_0   1         1   -1     1  [bword_0, bword_1]
3  bword_1   1         1   -1     1  [bword_0, bword_1]

Thank you for your response; however, it was not able to work with my data. I did some more digging around and discovered that I needed to convert my "string lists" into actual lists. To do this, I used AST import ast def obj_to_list(row): string = str(row) list_ = ast.literal_eval(string) return list_ df['objs_0_cln'] = df[objs_0'].apply(lambda x: obj_to_list(x)) — datFrugalBoi, Aug 20 '18 at 17:42
I think you might can achieve this using yaml also. import yaml. df = df.applymap(yaml.load) or try do just that column. df['objs'] = df['objs'].apply(yaml.load) — Scott Boston, Aug 20 '18 at 18:05

score 0 · Answer 2 · answered Aug 17 '18 at 18:53

You can use the following code for this condition

df[["new1","new2","new3"]]=pd.DataFrame(df.objs_0.values.tolist(),index=df.index)

so u can split the columns that are a list into new columns named new1,new2,new3 here

do realise that the number of columns that you are assigning should be exactly equal to the number of values the lists have in the column that you are splitting into new columns.

The code above will keep the original columns intact just add new columns in the dataframe, you can choose to drop the original columns if you like.

Pandas: convert df from "wide" to "long" format with col containing dtype object which should be an array

2 Answers2