You can use this answer and adapt to your case.
import pandas as pd
df = pd.DataFrame({'type_a': [1,0,0,0,0,1,0,0,0,1],
'type_b': [0,1,0,0,0,0,0,0,1,1],
'type_c': [0,0,1,1,1,1,0,0,0,0],
'type_d': [1,0,0,0,0,1,1,0,1,0],
})
df['type'] = df.dot(df.columns + ',')\
.str.rstrip(',')\
.apply(lambda x: x.split(','))
Where the output is
type_a type_b type_c type_d type
0 1 0 0 1 [type_a, type_d]
1 0 1 0 0 [type_b]
2 0 0 1 0 [type_c]
3 0 0 1 0 [type_c]
4 0 0 1 0 [type_c]
5 1 0 1 1 [type_a, type_c, type_d]
6 0 0 0 1 [type_d]
7 0 0 0 0 []
8 0 1 0 1 [type_b, type_d]
9 1 1 0 0 [type_a, type_b]
Edit 1
The general case will be
df['type'] = df.eq(1).dot(df.columns + ',')\
.str.rstrip(',')\
.apply(lambda x: x.split(','))
Edit 2
Eventually you can avoid lambda (in case your dataframe is big)
df['type'] = df.eq(1).dot(df.columns + ',')\
.str.rstrip(',')\
.str.split(',')
Edit 3: TIMING
Here I want to compare few solutions proposed here.
Generate Data
import pandas as pd
import numpy as np
n = 10_000
columns = ['type_a', 'type_b', 'type_c', 'type_d']
# set seed for reproducibility
np.random.seed(0)
df = pd.DataFrame(
np.random.randint(2, size=(n, 4)),
columns=columns)
# save copy of original data
df_bk = df.copy()
Test load the data
As we are going to load the data using timeit
we want to know how long it takes.
%%timeit -n 10 -r 10
df = df_bk.copy()
142 µs ± 40.4 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)
@bitflip's solution
%%timeit -n 10 -r 10
df = df_bk.copy()
df['type'] = df.apply(lambda x:
df.columns[x.eq(1)].tolist(), axis=1)
782 ms ± 33.2 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
@Naveed's solution
%%timeit -n 10 -r 10
df = df_bk.copy()
df['type'] = df.mul(df.columns)\
.apply(lambda x: list(pd.Series(i for i in x if len(i)>0)), axis=1)
619 ms ± 22.1 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
@Anoushiravan R's solution
%%timeit -n 10 -r 10
df = df_bk.copy()
df['type'] = (pd.melt(df.reset_index(), id_vars='index')
.query('value == 1')
.groupby('index')['variable']
.apply(lambda x:[str for str in x]))
148 ms ± 12.6 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
@rpanai's solution
%%timeit -n 10 -r 10
df = df_bk.copy()
df['type'] = df.eq(1).dot(df.columns + ',')\
.str.rstrip(',')\
.str.split(',')
13 ms ± 2.61 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)
Conclusion
As you can see from the following image (please click it to expand) the accepted solution is ways faster than others. Yet the vectorial solution suggested here manages to be 11x faster than the accepted solution.
