0

My dataframe looks like this

df = pd.Dataframe({ 'a': ["10001", "10001", "10002", "10002" , "10002"], 'b': ['hello', 'hello', 'hola', 'hello', 'hola']})

I want to create a new column 'c' of boolean values with the following condition:

  • If values of 'a' is the same (i.e. 1st and 2nd row, 3rd and 4th and 5th row), check if values of 'b' of those rows are the same. (2nd row returns True. 4th row returns False).
  • If values of 'a' is not the same, skip.

My current code is the following:

def check_consistency(col1,col2):
    df['match'] = df[col1].eq(df[col1].shift())
    t = []
    for i in df['match']:
        if i == True:
            t.append(df[col2].eq(df[col2].shift()))
check_consistency('a','b')

And it returns error.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114

2 Answers2

0

I think this is groupby

df.groupby('a').b.apply(lambda x : x==x.shift())
Out[431]: 
0    False
1     True
2    False
3    False
4    False
Name: b, dtype: bool
BENY
  • 317,841
  • 20
  • 164
  • 234
0

A bitwise & should do: Checking if both the conditions are satisfied:

df['c'] = (df.a == df.a.shift()) & (df.b == df.b.shift()) 

df.c
#0    False
#1     True
#2    False
#3    False
#4    False
#Name: c, dtype: bool

Alternatively, if you want to make your current code work, you can do something like (essentially doing the same check as above):

def check_consistency(col1,col2):
    df['match'] = df[col1].eq(df[col1].shift())

    for i in range(len(df['match'])):
        if (df['match'][i] == True):
            df.loc[i,'match'] = (df.loc[i, col2] == df.loc[i-1, col2])

check_consistency('a','b')
Mankind_008
  • 2,158
  • 2
  • 9
  • 15