1

I have a dataframe df which contains movies data. enter image description here.

I want to create a new column in df called "drama_movie" which contains the value True for the movies that are Dramas and False for if they are not.

I tried it with following code: df["drama_movie"]=df['listed_in'].isin(["Dramas"])

-> but I receive everything as False in the column drama_movie.

When I try the following code: df["drama_movie"]=df.apply(lambda x: x['listed_in'] in x['Dramas'], axis=1)

-> I receive a key error "Dramas"

What works is this code: df["drama_movie"] = df['listed_in'].str.contains('Dramas', case=False, na=False)

-> But I need to use pythons in operator. I'm somehow stuck with it. Any suggestions? Thank you for your help

ACarter
  • 5,688
  • 9
  • 39
  • 56
  • 2
    "*I need to use pythons in operator*" -> why do you **need** this? – mozway Jan 25 '23 at 09:53
  • Do you mean that this is an assignment? – mozway Jan 25 '23 at 09:57
  • `df.apply(lambda x: 'Dramas' in x['listed_in'], axis=1)` should work, but this is a bad use of `apply` – mozway Jan 25 '23 at 09:58
  • kind of. I passed a test but the feedback was that I didn't use the appropriate code to solve one part and that I should look at it as it is relevant for understanding and future progress. But as you can see I couldn't figure it out. Your solution worked. thx. what was my problem? I was close with the solution, but something went wrong. – new_python_guy Jan 25 '23 at 10:07
  • It's always hard to judge ratings… IMO `df["drama_movie"] = df['listed_in'].str.contains('Dramas', case=False, na=False)` is the correct pandas way to do it. You should ask your instructor for the reason. Maybe it's a wrong one ;) – mozway Jan 25 '23 at 10:21
  • not gonna mess with that women :D. I'm pretty sure there is something very valuable in the in-operator-way and I'm just to dumb to get it. – new_python_guy Jan 25 '23 at 10:26
  • You should still be able to ask for an explanation. – mozway Jan 25 '23 at 10:30
  • you're right. will do. – new_python_guy Jan 25 '23 at 10:35
  • Why do you need to use the `in` operator? You have a solution that works. – pigrammer Jan 25 '23 at 10:52
  • I will check with my instructor on monday in class. – new_python_guy Jan 25 '23 at 11:06

1 Answers1

1

You can split strings then explode lists then keep only rows that match your criteria:

drama_movies = (df.loc[df['listed_in'].str.split(',').explode()
                                      .loc[lambda x: x.isin(['Dramas'])].index])

Don't use apply here or use a comprehension:

drama_movies = df[['Dramas' in s.split(',') for s in df['listed_in']]]

# For 200 rows
# apply: 1.16 ms ± 20.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# comprehension: 156 µs ± 262 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Corralien
  • 109,409
  • 8
  • 28
  • 52