1

What I'm trying to do is have if the row from ColX is in the row ColZ I want a new column to be ColZ if not its colA.

I kept looking around but couldn't find a solution.

My data is a bit more in depth but this example should do it.

Perhaps there is a way out of janitor to do it which I am open to.

Edit:

I put in the wrong example code. Totally my fault. Updating it now.

df  = pd.DataFrame(
    {
'colZ' :["zang", "zang", "zang", "z", "zang"],
'colX' :["A", "B", "B", "A", "Z"],
'colA' :["1", "1", "1", "1", "1"],
    }
)


# Desired Output:

output_df = pd.DataFrame(
    {
'colZ' :["zang", "zang", "zang", "z", "zang"],
'colX' :["A", "B", "B", "A", "Z"],
'colA' :["1", "1", "1", "1", "1"],
'result' :["zang", "1", "1", "1", "zang"]
    }
)

Here is what I have tried.

output_df = jn.case_when(df,

                  df['colZ'].str.contains(df['colX']),  df['colZ'],
                  df['colA'],

                  column_name='result')

# Also tried this and many others

output_df = jn.case_when(df,

                  df['colZ'].isin(df['colX']),  df['colZ'],
                  df['colA'],

                  column_name='result')


Fugles
  • 151
  • 7

1 Answers1

2

We can break it into 2 parts:

  1. Create variables for each columns:
col_a = df['colA']
col_x = df['colX']
col_z = df['colZ']
  1. Iterate over the rows and check if the word in the ColX is included in the colZ:
df['result'] = [col_z[col_index] if col_x[col_index].upper() in col_z[col_index].upper() else col_a[col_index] for col_index in range(df.shape[0])]

Or you can do a bigger one-liner by not initializing new variables on step 1, but it's getting too heavy...
Hope it helps!

Another way would be to zip:

df['result'] = [colz 
                if colx.lower() in colz 
                else cola 
                for colz, colx, cola 
                in zip(df.colZ, df.colX, df.colA)]
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
CodeCop
  • 1
  • 2
  • 15
  • 37
  • My apologies I totally put in the wrong example code. – Fugles Jan 06 '23 at 22:39
  • Hi, it's fine, but I am having a hard time figuring out the pattern in this new example as well - could you please elaborate? – CodeCop Jan 06 '23 at 23:00
  • Sure basically if on any row if ColX is in ColZ then I want the results column to be colZ. If not then I want it to be ColA – Fugles Jan 06 '23 at 23:45
  • Ok so if I got you correct - in your example above, you check if 'Z' is in 'zang' , it is so we output 'zang', then check if 'B' is in 'zang' and it's not so we output '1' and so on.. ? I see it is not case sensitive - should it stay like that? – CodeCop Jan 06 '23 at 23:50
  • yes you're correct, they could all be lower case really. I'm kinda newer to all this. – Fugles Jan 06 '23 at 23:56
  • you can remove all of the `.upper()` if that's the case :) – CodeCop Jan 07 '23 at 00:02
  • 1
    That was a BIG help thank you! – Fugles Jan 07 '23 at 00:22