2

I'm close to achieve what I want thanks to Python Pandas Groupby/Append columns but still not quite there.

DF:

City Plan Problem 1 Problem 2 Problem 3
Genoa Service 1 aaa bbb ccc
Genoa Service 2 ddd zzz yyy
Genoa Service 3 ggg ppp jjj
Venice Service 2 lll vvv
Venice Service 3 eee fff mmm

Expected Output:

City Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
Genoa aaa bbb ccc ddd zzz yyy ggg ppp jjj
Venice lll vvv eee fff mmm

Basically I want to:

  1. Group by City
  2. Discard Plan Column (if possible)
  3. Append all the other parameters (They still need to be always in order, so if a service is missing the cells would be empty.

After playing a while with unstack and cumcount from the linked solution, I'm still missing something to respect the order of the Plan column and fill with empty cells if a service is missing.

This is the code I'm using:

import pandas as pd

df = pd.read_csv('input.csv')

df1 = df.set_index('City').stack().reset_index(name='vals')
df1['g'] = 'Param' + df1.groupby('City').cumcount().add(1).astype(str)
df1 = df1.pivot(index='City', columns='g', values='vals')

df1.to_csv('output.csv')

In my tests I've removed the Plan column from the input, but the problem is that after ordering the parameters in the output, if, for example, a city has only Service 3, they are still aligned under Service 1.

  • Could you please add an executable code sample which is also showing where you are stuck? – JE_Muc Dec 18 '20 at 10:05
  • 1
    Got it, just added an example I'm using for test, although I can't figure out how to modify it to consider the Plan column too. – lsgbusiness Dec 18 '20 at 10:16

1 Answers1

4

This is a pivot problem, but you can also do this by stacking and unstacking:

s = df.set_index(['City', 'Plan']).stack().unstack([1, 2])
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)

print (s)

       Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City                                                                                            
Genoa        aaa       bbb       ccc       ddd       zzz       yyy       ggg       ppp       jjj
Venice       NaN       NaN       NaN       lll       vvv       NaN       eee       fff       mmm

Another way using melt:

s = df.melt(['City', 'Plan']).pivot('City', ['Plan', 'variable'], 'value')
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)

print (s)
       Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City                                                                                            
Genoa        aaa       ddd       ggg       bbb       zzz       ppp       ccc       yyy       jjj
Venice       NaN       lll       eee       NaN       vvv       fff       NaN       NaN       mmm

The ordering is a bit different, but the relative ordering between Services is preserved.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    really like the first solution, nice to see your around – Umar.H Dec 18 '20 at 10:19
  • @Manakin Hello! Not sure we've met or if I knew you from before as a different username, sorry. Good to see you! – cs95 Dec 18 '20 at 10:20
  • 1
    @cs95 this this the trick, thank you, and easy to understand. Still trying getting around Pandas potential. – lsgbusiness Dec 18 '20 at 10:26
  • my old nick was Datanovice, i was forced to change as I've been told I'm no longer a novice ;) @cs95 – Umar.H Dec 18 '20 at 10:39
  • 1
    @Manakin Ah that rings a bell, I remember you used to comment on some of my older answers with follow up questions (which, of course, I was happy to oblige most of the time ;- ) – cs95 Dec 18 '20 at 10:48