0

This is similar to reversing one-hot encoding, but I have multiple columns that might be labeled.

I have this:

|col1|col2|
|1   |0   |
|0   |1   |
|1   |1   |

I want this:

|col1|col2|new        |
|1   |0   |'col1'     |
|0   |1   |'col2'     |
|1   |1   |'col1_col2'|

Here is what I tried:

df.idxmax(axis=1)

It only returns the first instance and will not capture rows that have multiple 1s

def get_cat(row):
    temp = []
    for c in df[codes].columns:
        if row[c]==1:
            return c   

This does the same thing: it only returns the first column name and misses rows with multiple columns having a 1.

NLR
  • 1,714
  • 2
  • 11
  • 21

1 Answers1

1

Use this

def get_cat(row):
    temp = [a for a, b in row.items() if b == 1]

    return '_'.join(temp)

row is a pandas.Series.

Elmex80s
  • 3,428
  • 1
  • 15
  • 23