Multiple conditions for string variable

Question

I am trying to add a new column "profile_type" to a dataframe "df_new" which contains the string "Decision Maker" if the "job_title" has any one of the following words: (Head or VP or COO or CEO or CMO or CLO or Chief or Partner or Founder or Owner or CIO or CTO or President or Leaders),

"Key Influencer" if the "job_title" has any one of the following words: (Senior or Consultant or Manager or Learning or Training or Talent or HR or Human Resources or Consultant or L&D or Lead), and

"Influencer" for all other fields in "job_title".

For example, if the 'job_title' includes a row "Learning and Development Specialist", the code has to pull out just the word 'Learning' and segregate it as 'Key Influencer' under 'profile_type'.

Welcome to Stack Overflow. Please read [ask] and note well that this is **not a discussion forum**. As such, "Any help is highly appreciated." is [not answerable](https://meta.stackoverflow.com/questions/284236), and "Thanks in advance." is [not wanted](https://meta.stackoverflow.com/questions/288160). "I've tried with if-else statements, but returns nothing" We can only possibly comment on attempts **that are actually shown to us**. — Karl Knechtel, Nov 25 '22 at 09:45

Hobanator · Answer 1 · 2022-11-25T14:58:44.513

0

I would try something like this:

import numpy as np

dm_titles = ['Head', 'VP', 'COO', ...]
ki_titles = ['Senior ', 'Consultant', 'Manager', ...]


conditions = [
(any([word in  new_df['job_title'] for word in dm_titles])),
(any([word in  new_df['job_title'] for word in ki_titles])),
(all([word not in  new_df['job_title'] for word in dm_titles] + [word not in  new_df['job_title'] for word in ki_titles]))
]

values = ["Decision Maker", "Key Influencer", "Influencer"]

df_new['profile_type'] = np.select(conditions, values)

Let me know if you need any clarification!

edited Nov 25 '22 at 14:58

answered Nov 25 '22 at 09:47

Hobanator

81
3

Thank you @Hobanator. I tried the code that you suggested, However, it returns only 'Influencer'. For example, if the 'job_title' includes a row "Learning and Development Specialist", the code has to pull out just the word 'Learning' and segregate it as 'Key Influencer' under 'profile_type'. – Alisha A Nov 25 '22 at 10:09
The edited code is returning only 'Influencer'. – Alisha A Nov 28 '22 at 05:48

score 0 · Accepted Answer · answered Nov 28 '22 at 08:09

The below code worked for me.

import re
s1 = pd.Series(df['job_title'])

condition1 = s1.str.contains('Director|Head|VP|COO|CEO...', flags=re.IGNORECASE, regex=True)

condition2 = s1.str.contains('Senior|Consultant|Manager|Learning...', flags=re.IGNORECASE, regex=True)

df_new['profile_type'] = np.where(condition1 == True, 'Decision Maker', 
         (np.where(condition2 == True, 'Key Influencer', 'Influencer')))

score -1 · Answer 3 · answered Nov 25 '22 at 09:42

-1

First, define a function that acts on a row of the dataframe, and returns what you want: in your case, 'Decision Maker' if the job_title contains any words in your list.

def is_key_worker(row):
    if (row["job_title"] == "CTO" or row["job_title"]=="Founder") # add more here.

Next, apply the function to your dataframe, along axis 1.

df_new["Key influencer"] = df_new.apply(is_key_worker, axis=1)

answered Nov 25 '22 at 09:42

butterflyknife

1,438
8
17

For example, if the 'job_title' includes a row "Learning and Development Specialist", the code has to pull out just the word 'Learning' and segregate it as 'Key Influencer' under 'profile_type'. – Alisha A Nov 25 '22 at 09:46
I would suggest reading the comment made on your answer, on how to ask a good question. – butterflyknife Nov 25 '22 at 09:52

Multiple conditions for string variable

3 Answers3