Say I have the following dataframe df
:
A B C
0 mom;dad;son; sister;son; yes;no;maybe;
1 dad; daughter;niece; no;snow;
2 son;dad; cat;son;dad; tree;dad;son;
3 daughter;mom; niece; referee;
4 dad;daughter; cat; dad;
And you want to check if, between columns A
, B
, and C
, there is a common word, and create a column D
with 1
if there is and 0
if there isn't any. For a word to be common, it's enough for it to appear in just two of the three columns.
The outcome should be:
A B C D
0 mom;dad;son; sister;son; yes;no;maybe; 1
1 dad; daughter;niece; no;snow; 0
2 son;dad; cat;son;dad; tree;dad;son; 1
3 daughter;mom; niece; referee; 0
4 dad;daughter; cat; dad; 1
I am trying to implement this by doing:
for index, row in df.iterrows():
w1=row['A'].split(';')
w2=row['B'].split(';')
w3=row['C'].split(';')
if len(set(w1).intersection(w2))>0 or len(set(w1).intersection(w3))>0 or len(set(w2).intersection(w3))>0:
df['D'][index]==1
else:
df['D'][index]==0
However, the resulting D
column only bears 0
because (possibly) I am not comparing each individual word in w1 to the others in w2 and w3. How could I achieve this?