Return column for the first occurrence of "1" in row

Question

I have a dataframe as below :

When there is change in column "col", I need to scan the columns, cola, colb, colc, cola1,colb1,colc1 for the first occurance of "1".

Then, the condition cola==cola1==1 or colb==colb1==1 or colc==colc1==1 should satisfy, and return that column name in the new column "New_col"

Input:

     cola   colb  colc  cola1 colb1 colc1  col
      0       0    0     0     0     0      18
      1       0    0     1     1     0      18
      1       0    0     1     1     0      18
      1       0    0     1     1     0      18
      1       0    0     1     1     0      18
      0       0    0     0     0     0      24
      0       1    0     0     1     0      24
      0       1    0     0     1     0      24
      0       1    0     0     1     0      24
      0       1    0     0     1     0      24
      0       0    0     0     1     0      10

Expected Output:

     cola   colb  colc  cola1 colb1 colc1  col  New_col
     0       0    0     0     0     0      18    
     1       0    0     1     1     0      18
     1       0    0     1     1     0      18
     1       0    0     1     1     0      18
     1       0    0     1     1     0      18
     0       0    0     0     0     0      24     cola
     0       1    0     0     1     0      24
     0       1    0     0     1     0      24
     0       1    0     0     1     0      24
     0       1    0     0     1     0      24
     0       0    0     0     1     0      10    colb

What if all 3-conditions satisfies. what output you want? is there any precedence? — Shivkumar kondi, Dec 07 '16 at 07:21
I dont understand why is output in its row. Because it seems it is first occurence of previous group. Can you explain more? — jezrael, Dec 07 '16 at 07:34
In the 5th row, the value of column "col" is changing from 0 to 1. Hence i need to scan all the other columns above this row for the first occurrence of "1". And the output to be returned should satisfy: Only when cola and cola1 = 1 or colb and colb1 =1 or colc and colc1 =1. — Anagha, Dec 07 '16 at 08:12
I would guess it should output the first column that equals 1. That's the question in the title. But the example output does not make any sense, since in row 6 there is no column which equals 1 and in the last row col does not equal 1. — Robin Koch, Dec 07 '16 at 09:31
For the first comment, At any given point of time, either cola, colb or colc, will have the value 1. That's the assumption. Next condition, should be, when cola = cola1 =1 then return cola or colb = colb1 = 1, then return colb or colc = colc1 =1 , then return colc. And this condition is looked, whenever there is change in column "col". — Anagha, Dec 07 '16 at 09:52

score 1 · Answer 1 · answered Dec 07 '16 at 13:20

An almost complete answer:

First, calculate the conditions you're using and put them in columns:

df['conda'] = (df['cola'] == 1) & (df['cola1'] == 1)
df['condb'] = (df['colb'] == 1) & (df['colb1'] == 1)
df['condc'] = (df['colc'] == 1) & (df['colc1'] == 1)

df['anycond'] = df['conda'] | df['condb'] | df['condc']

Second, define a function that takes the first true condition (of a, b, c) as soon as any condition is true:

def get_first_condition(g):
    if not g['anycond'].any():
        return np.nan
    mask = g.loc[g['anycond'], ['conda', 'condb', 'condc']].iloc[0].values
    return np.array(['cola', 'colb', 'cocl'])[mask][0]

Third, apply to the dataframe grouped by col:

df.groupby('col').apply(get_first_condition)

col
10     NaN
18    cola
24    colb

Fourth, just use this result to populate the new column. You will need to do some shifting.

Return column for the first occurrence of "1" in row

1 Answers1