1

Is there a SFrame stack equivalent in pandas dataframes? Pandas' own stack works only with levels whereas I am looking for expanding a single column at the same level as others which contains lists.

Input Dataframe: There are some more columns like user in actual dataframe

+-------+------------------+
| user  |     friends      |
+-------+------------------+
|  1    |     [2, 3, 4]    |
|  2    |      [5, 6]      |
|  3    | [4, 5, 10, None] |
+----- -+------------------+

Output Dataframe:There are some more columns like user in actual dataframe which should get repeated similarly

+------+--------+
| user | friend |
+------+--------+
|  1   |  2     |
|  1   |  3     |
|  1   |  4     |
|  2   |  5     |
|  2   |  6     |
|  3   |  4     |
|  3   |  5     |
|  3   |  10    |
|  3   |  None  |
+------+--------+
Ankit Goel
  • 360
  • 1
  • 5
  • 18
  • If you can show your input, along with your expected output. we can help. As of yet, it is not clear what you want. – cs95 Aug 19 '17 at 03:58
  • You may want to give a practical example of your Pandas frame and what you want to do with it (and perhaps what you've already tried). Not everyone will now what an SFrame stack is. –  Aug 19 '17 at 04:13
  • Yup. just realised. Put in an example of input df and output df required. – Ankit Goel Aug 19 '17 at 04:16
  • Can you please accept the answer if it works for you? – Gayatri Aug 20 '17 at 05:01

2 Answers2

1

You could do this

data['friend'].apply(pd.Series).stack().reset_index(level=1, drop=True).to_frame('friend').join(data[['user']], how='left')

This would also work if you had more than one column which was similar to "user" column say something like "other column", then you would just do

data['friend'].apply(pd.Series).stack().reset_index(level=1, drop=True).to_frame('friend').join(data[['user',"other column"]], how='left')
Gayatri
  • 2,197
  • 4
  • 23
  • 35
  • Thanks! this works very neatly. However, it takes away the 'None' value which I need to retain. I would have preferred this as the right answer otherwise, since this is easily extensible to multiple columns and also uses the pandas 'stack' function. – Ankit Goel Aug 20 '17 at 18:32
1
pd.DataFrame.from_items([
    ('user', df.user.values.repeat(df.friends.str.len())),
    ('friends', np.concatenate(df.friends))
])

   user friends
0     1       2
1     1       3
2     1       4
3     2       5
4     2       6
5     3       4
6     3       5
7     3      10
8     3    None
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks! This works perfectly. I have several columns in my df. Should I just use 'for loop' to create those items and then pass the dataframe constructor? – Ankit Goel Aug 20 '17 at 18:30