Change table to tall format using panda (UNPIVOT)

Question

I have a table like this

   user         company company2 company3 company4
    1           Mac     Lenovo    Hp      null              
    2           Mac       MSI     Sony

And using pandas I would like it to be

     user    company
     1          Mac
     1          Lenovo
     1          Hp
     2         Mac

and so on Here I tried it but didnt work with pandas pivot.

dataframe = pd.read_csv('data.csv')
dataframe.fillna(value='', inplace=True)
#dataframe.pivot(index='user', columns='company')

Above code doesnt work and gives error.

"It doesn't work" is not particularly specific. Please include the exact error message you see. — Metropolis, Apr 14 '17 at 19:22
@Metropolis I was thinking the error is pretty stupid so dint want to use it. Sorry will do my best next time — Aurora, Apr 14 '17 at 19:32
You can edit your question and include the error message. The exact error is usually very helpful for helping to debug a problem. — Metropolis, Apr 14 '17 at 19:35

MaxU - stand with Ukraine · Accepted Answer · 2017-04-14T19:46:35.797

8

you can use pd.melt method:

In [211]: pd.melt(df, id_vars='user', value_vars=df.columns.drop('user').tolist())
Out[211]:
   user  variable   value
0     1   company     Mac
1     2   company     Mac
2     1  company2  Lenovo
3     2  company2     MSI
4     1  company3      Hp
5     2  company3    Sony
6     1  company4    null
7     2  company4     NaN

or

In [213]: pd.melt(df,
                  id_vars='user', value_vars=df.columns.drop('user').tolist(),
                  value_name='Company') \
            .drop('variable',1)
Out[213]:
   user Company
0     1     Mac
1     2     Mac
2     1  Lenovo
3     2     MSI
4     1      Hp
5     2    Sony
6     1    null
7     2     NaN

UPDATE: dropping NaN's and sorting resulting DF by user:

In [218]: pd.melt(df,
     ...:         id_vars='user', value_vars=df.columns.drop('user').tolist(),
     ...:         value_name='Company') \
     ...:   .drop('variable',1) \
     ...:   .dropna() \
     ...:   .sort_values('user')
     ...:
Out[218]:
   user Company
0     1     Mac
2     1  Lenovo
4     1      Hp
6     1    null
1     2     Mac
3     2     MSI
5     2    Sony

PS if you want to get rid of null values - use df.replace('null', np.nan) instead of df:

In [219]: pd.melt(df.replace('null', np.nan),
     ...:         id_vars='user', value_vars=df.columns.drop('user').tolist(),
     ...:         value_name='Company') \
     ...:   .drop('variable',1) \
     ...:   .dropna() \
     ...:   .sort_values('user')
     ...:
Out[219]:
   user Company
0     1     Mac
2     1  Lenovo
4     1      Hp
1     2     Mac
3     2     MSI
5     2    Sony

edited Apr 14 '17 at 19:46

answered Apr 14 '17 at 19:19

MaxU - stand with Ukraine

205,989
36
386
419

@Aurora, could you be more specific, please? – MaxU - stand with Ukraine Apr 14 '17 at 19:28
Sorry, Umm nothing changed, the tables are the same – Aurora Apr 14 '17 at 19:29
I am tryin to use the second example of yours, the first one works – Aurora Apr 14 '17 at 19:30
@Aurora, try this: `result = pd.melt(...)` – MaxU - stand with Ukraine Apr 14 '17 at 19:30
Oh yea sorry!! I totally forgot to get the output value haha. Thanks alot Max! – Aurora Apr 14 '17 at 19:32
Sorry I have a problem again, I noticed only the head elements so I accepted the answer, but the first row result is like this user 1 ,comapny Mac, then the second row is for user 2 , it dint continue with user 1 :( – Aurora Apr 14 '17 at 19:36
@Aurora, so do you want to sort the result data set by `user`? You can add `.sort_values('user')` to the solution – MaxU - stand with Ukraine Apr 14 '17 at 19:37
Yes! That would be great! – Aurora Apr 14 '17 at 19:38
I added sortvalue('user') but now some of the values are like in different rows and not one by one, could you please edit the answer. – Aurora Apr 14 '17 at 19:40
@Aurora, please check UPDATE - is that what you want? – MaxU - stand with Ukraine Apr 14 '17 at 19:40
1

yes exactly like that! Now its fine! sorry for the trouble, but you helped me learn something new ^_^ – Aurora Apr 14 '17 at 19:44

tmrlvi · Answer 2 · 2017-04-14T19:43:13.010

4

It is possible to use stack for it (don't know if it is more efficient then melt:

dataframe.set_index("user").stack().reset_index(-1, drop=True)

user
1       Mac
1    Lenovo
1        Hp
2       MSI
2       Mac
2      Sony

Stack essentially pushes the columns to be part of the index (and create MultiIndex) - thus, for every column-row combination, you get a row in the new DataFrame. That is, the DataFrame

   C1 C2
0  A  B
1  a  b

after stack() becomes the Series

0  C1 A
0  C2 B
1  C1 a
1  C2 b

edited Apr 14 '17 at 19:43

answered Apr 14 '17 at 19:22

tmrlvi

2,235
17
35

I tried it but dint actually work properly, could you please edit the answer to follow my example, It would be great! – Aurora Apr 14 '17 at 19:38
I assumed `user` was your index. Try the above version (I added `set_index("user")`) – tmrlvi Apr 14 '17 at 19:43

Change table to tall format using panda (UNPIVOT)

2 Answers2