Aggregate features row-wise in dataframe

Question

i am trying to create features from sample that looks like this:

index	user	product	sub_product	status
0	u1	p1	sp1	NA
1	u1	p1	sp2	NA
2	u1	p1	sp3	CANCELED
3	u1	p1	sp4	AVAIL
4	u2	p3	sp2	AVAIL
5	u2	p3	sp3	CANCELED
6	u2	p3	sp7	NA

first, i created dummies:

pd.get_dummies(x, columns = ['product', 'sub_product', 'status']

but i also need to group by row, to have 1 row by user, what is the best way to do it?
If i'll just group it:

pd.get_dummies(x, columns = ['product', 'sub_product', 'status'].groupby('user').max()

user	product_p1	product_p3	sub_product_sp1	sub_product_sp2	sub_product_sp3	sub_product_sp4	sub_product_sp7	status_AVAIL	status_CANCELED	status_NA
u1	1	0	1	1	1	1	0	1	1	1
u2	0	1	0	1	1	0	1	1	1	1

i will loose information, fo ex. that for u1 sp3 status is canceled. So it's looks like i have to create dummies for every column combination?

Quang Hoang · Accepted Answer · 2021-05-12T14:53:26.903

0

Update: You are basically looking for pivot:

out = (df.astype(str)
   .assign(value=1)
   .pivot_table(index=['user'], columns=['product','sub_product','status'],
                values='value', fill_value=0, aggfunc='max')
)

out.columns = ['_'.join(x) for x in out.columns]

edited May 12 '21 at 14:53

answered May 12 '21 at 14:39

Quang Hoang

146,074
10
56
74

i tried, the same result as i already have with pd.get_dummies(x, columns = ['product', 'sub_product', 'status'].groupby('user').max() I think i need to have columns like sp3_status_AVAIL etc – spynal May 12 '21 at 14:48
@spynal I see, I thought that was what you expected. So what is your expected output? It seems like a `pivot` question. – Quang Hoang May 12 '21 at 14:50
I think i need also to have columns like sp3_status_AVAIL, sp3_status_NA etc, so basically all combination of existing columns. – spynal May 12 '21 at 14:52

Aggregate features row-wise in dataframe

1 Answers1