-1

I have a dataframe as below :

When there is change in column "col", I need to scan the columns, cola, colb, colc, cola1,colb1,colc1 for the first occurance of "1".

Then, the condition cola==cola1==1 or colb==colb1==1 or colc==colc1==1 should satisfy, and return that column name in the new column "New_col"

Input:

     cola   colb  colc  cola1 colb1 colc1  col
      0       0    0     0     0     0      18
      1       0    0     1     1     0      18
      1       0    0     1     1     0      18
      1       0    0     1     1     0      18
      1       0    0     1     1     0      18
      0       0    0     0     0     0      24
      0       1    0     0     1     0      24
      0       1    0     0     1     0      24
      0       1    0     0     1     0      24
      0       1    0     0     1     0      24
      0       0    0     0     1     0      10

Expected Output:

     cola   colb  colc  cola1 colb1 colc1  col  New_col
     0       0    0     0     0     0      18    
     1       0    0     1     1     0      18
     1       0    0     1     1     0      18
     1       0    0     1     1     0      18
     1       0    0     1     1     0      18
     0       0    0     0     0     0      24     cola
     0       1    0     0     1     0      24
     0       1    0     0     1     0      24
     0       1    0     0     1     0      24
     0       1    0     0     1     0      24
     0       0    0     0     1     0      10    colb  
Anagha
  • 3,073
  • 8
  • 25
  • 43
  • 1
    What if all 3-conditions satisfies. what output you want? is there any precedence? – Shivkumar kondi Dec 07 '16 at 07:21
  • I dont understand why is output in its row. Because it seems it is first occurence of previous group. Can you explain more? – jezrael Dec 07 '16 at 07:34
  • In the 5th row, the value of column "col" is changing from 0 to 1. Hence i need to scan all the other columns above this row for the first occurrence of "1". And the output to be returned should satisfy: Only when cola and cola1 = 1 or colb and colb1 =1 or colc and colc1 =1. – Anagha Dec 07 '16 at 08:12
  • We need the answer to the first comment. – IanS Dec 07 '16 at 09:18
  • I would guess it should output the first column that equals 1. That's the question in the title. But the example output does not make any sense, since in row 6 there is no column which equals 1 and in the last row col does not equal 1. – Robin Koch Dec 07 '16 at 09:31
  • For the first comment, At any given point of time, either cola, colb or colc, will have the value 1. That's the assumption. Next condition, should be, when cola = cola1 =1 then return cola or colb = colb1 = 1, then return colb or colc = colc1 =1 , then return colc. And this condition is looked, whenever there is change in column "col". – Anagha Dec 07 '16 at 09:52
  • I have edited , for clarity – Anagha Dec 07 '16 at 10:12

1 Answers1

1

An almost complete answer:

First, calculate the conditions you're using and put them in columns:

df['conda'] = (df['cola'] == 1) & (df['cola1'] == 1)
df['condb'] = (df['colb'] == 1) & (df['colb1'] == 1)
df['condc'] = (df['colc'] == 1) & (df['colc1'] == 1)

df['anycond'] = df['conda'] | df['condb'] | df['condc']

Second, define a function that takes the first true condition (of a, b, c) as soon as any condition is true:

def get_first_condition(g):
    if not g['anycond'].any():
        return np.nan
    mask = g.loc[g['anycond'], ['conda', 'condb', 'condc']].iloc[0].values
    return np.array(['cola', 'colb', 'cocl'])[mask][0]

Third, apply to the dataframe grouped by col:

df.groupby('col').apply(get_first_condition)

col
10     NaN
18    cola
24    colb

Fourth, just use this result to populate the new column. You will need to do some shifting.

IanS
  • 15,771
  • 9
  • 60
  • 84