1

I have a dataframe (df1) where a column (Detail) contains a string in each row. I split each string of the column into a list using df1.Detail.str.split().

I have another column (Pass) that is set to 0 by default. I am trying to change the value of df1[Pass] to 1 if the list in df1[Detail] contains the word 'pass'. I am trying to do this on a row by row basis using iterrows().

When I run the following code, it properly displays rows that match my criteria and the corresponding index:

for index,row in df1.iterrows():
    if 'pass' in i.Detail:
        print i.Detail, index

However, when I try to update the row values in 'Pass' using the following code:

for index,row in df1.iterrows():
    if 'pass' in i.Detail:
        df1.loc[index,'Pass'] = 1

It ends up updating 98% of the row values in 'Pass' to 1, even if the row does not fit the criteria of containing the word 'pass' in 'Detail'. Does anybody know what could be causing this issue?

user3294779
  • 593
  • 2
  • 7
  • 23

1 Answers1

0

I suggest use non loop vectorized solution with str.contains and cast boolean mask to integer - True is 1 and False is 0:

df.Pass = df.Detail.str.contains('pass').astype(int)

Sample:

df = pd.DataFrame({'Detail':['pass exam','not passed','aaa'],
                   'Pass':[1]*3})

#match substrings
df.Pass = df.Detail.str.contains('pass').astype(int)
#match whole word only
#https://stackoverflow.com/a/37457930/2901002
df['Pass1'] = df.Detail.str.contains(r'(?:\s|^)pass(?:\s|$)').astype(int)

print (df)
       Detail  Pass  Pass1
0   pass exam     1      1
1  not passed     1      0
2         aaa     0      0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This looks like a nice solution. Is the best way to add multiple conditions "df.Pass = (df.Detail.str.contains('pass') & df.Detail.str.contains('complete')).astype(int)"? Will it work w/ more than 2 conditions? – user3294779 Apr 22 '18 at 07:26
  • @user3294779 - Yes, sure. – jezrael Apr 22 '18 at 07:26
  • 1
    Thank you! Appreciate the help. – user3294779 Apr 22 '18 at 07:29
  • @user3294779 - One small thing, if value is `passed` do you need `1` or `0` ? – jezrael Apr 22 '18 at 07:29
  • If Detail.str.contains('pass') & Detail.str.contains('complete') the value of df.Pass should be 1. Am I still on the right track? – user3294779 Apr 22 '18 at 18:39
  • I am thinking there is `OR` (`|`) `Detail.str.contains('pass') | Detail.str.contains('complete')` what is same like `Detail.str.contains('pass|complete')`. the best test in small test DataFrame like in answer and if all working nice apply solution to big real data Dataframe. – jezrael Apr 22 '18 at 18:41
  • EDIT: Detail.str.contains('pass | complete') does work, Detail.str.contains('pass & complete') does NOT work. Thank you for solution! – user3294779 Apr 22 '18 at 19:28