Python Pandas groupby multiple columns and append

Question

I'm close to achieve what I want thanks to Python Pandas Groupby/Append columns but still not quite there.

DF:

City	Plan	Problem 1	Problem 2	Problem 3
Genoa	Service 1	aaa	bbb	ccc
Genoa	Service 2	ddd	zzz	yyy
Genoa	Service 3	ggg	ppp	jjj
Venice	Service 2	lll	vvv
Venice	Service 3	eee	fff	mmm

Expected Output:

City	Problem 1	Problem 2	Problem 3	Problem 4	Problem 5	Problem 6	Problem 7	Problem 8	Problem 9
Genoa	aaa	bbb	ccc	ddd	zzz	yyy	ggg	ppp	jjj
Venice				lll	vvv		eee	fff	mmm

Basically I want to:

Group by City
Discard Plan Column (if possible)
Append all the other parameters (They still need to be always in order, so if a service is missing the cells would be empty.

After playing a while with unstack and cumcount from the linked solution, I'm still missing something to respect the order of the Plan column and fill with empty cells if a service is missing.

This is the code I'm using:

import pandas as pd

df = pd.read_csv('input.csv')

df1 = df.set_index('City').stack().reset_index(name='vals')
df1['g'] = 'Param' + df1.groupby('City').cumcount().add(1).astype(str)
df1 = df1.pivot(index='City', columns='g', values='vals')

df1.to_csv('output.csv')

In my tests I've removed the Plan column from the input, but the problem is that after ordering the parameters in the output, if, for example, a city has only Service 3, they are still aligned under Service 1.

Could you please add an executable code sample which is also showing where you are stuck? — JE_Muc, Dec 18 '20 at 10:05
Got it, just added an example I'm using for test, although I can't figure out how to modify it to consider the Plan column too. — lsgbusiness, Dec 18 '20 at 10:16

score 4 · Accepted Answer · answered Dec 18 '20 at 10:12

This is a pivot problem, but you can also do this by stacking and unstacking:

s = df.set_index(['City', 'Plan']).stack().unstack([1, 2])
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)

print (s)

       Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City                                                                                            
Genoa        aaa       bbb       ccc       ddd       zzz       yyy       ggg       ppp       jjj
Venice       NaN       NaN       NaN       lll       vvv       NaN       eee       fff       mmm

Another way using melt:

s = df.melt(['City', 'Plan']).pivot('City', ['Plan', 'variable'], 'value')
s.columns = 'Problem ' + pd.RangeIndex(1, s.shape[1]+1).astype(str)

print (s)
       Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 Problem 7 Problem 8 Problem 9
City                                                                                            
Genoa        aaa       ddd       ggg       bbb       zzz       ppp       ccc       yyy       jjj
Venice       NaN       lll       eee       NaN       vvv       fff       NaN       NaN       mmm

The ordering is a bit different, but the relative ordering between Services is preserved.

@Manakin Hello! Not sure we've met or if I knew you from before as a different username, sorry. Good to see you! — cs95, Dec 18 '20 at 10:20
@cs95 this this the trick, thank you, and easy to understand. Still trying getting around Pandas potential. — lsgbusiness, Dec 18 '20 at 10:26
my old nick was Datanovice, i was forced to change as I've been told I'm no longer a novice ;) @cs95 — Umar.H, Dec 18 '20 at 10:39
@Manakin Ah that rings a bell, I remember you used to comment on some of my older answers with follow up questions (which, of course, I was happy to oblige most of the time ;- ) — cs95, Dec 18 '20 at 10:48

Python Pandas groupby multiple columns and append

1 Answers1