how to apply set and ignorecase in a single datacolumn in pandas

Question

I have a df,

 Keys        
 one, ONE    
 ram, Ram
 kumar
 Raj,rAj
 cricket
 level,LeVel
 kum,num

first I want to apply set and ignore case on df["Keys"], make it a single value and achieve

 df
Name
one
ram
kumar
raj
cricket
level
kum,num

2nd operation,

I have a list and my above DataFrame, df["name"]

 my_list=["ONE","Ram","CRICKEt","KUm"]

I need compare df["name"].str.lower.split(,) with my_list.lower()

if a value present in my_list then we need to change in df["Name"]

My desired output is,

 df,
 name
 ONE
 Ram
 kumar
 raj
 CRICKEt
 level
 KUm,num

Thanks in advance

jezrael · Accepted Answer · 2017-11-04T16:29:16.167

1

Use str.lower + split + apply + join:

df['Name'] = df['Keys'].str.lower().str.split(',').apply(set).str.join(',')
print (df)
          Keys     Name
0      one,ONE      one
1      ram,Ram      ram
2        kumar    kumar
3      Raj,rAj      raj
4      cricket  cricket
5  level,LeVel    level
6      kum,num  num,kum

If is possible after , are whitespaces use ,\s* for separator - comma + zero or more:

df['Name'] = df['Keys'].str.lower().str.split(',\s*').apply(set).str.join(',')
print (df)
          Keys     Name
0     one, ONE      one
1     ram, Ram      ram
2        kumar    kumar
3      Raj,rAj      raj
4      cricket  cricket
5  level,LeVel    level
6      kum,num  num,kum

EDIT:

Last create dictionary and then replace:

my_list=["ONE","Ram","CRICKEt","KUm"]
d = dict(zip([x.lower() for x in my_list],my_list))
print (d)
{'cricket': 'CRICKEt', 'one': 'ONE', 'ram': 'Ram', 'kum': 'KUm'}

splitted = df['Keys'].str.lower().str.split(',').apply(set)
df['Name'] = splitted.str.join(',').replace(d, regex=True)
df['Count'] = splitted.str.len()
print (df)
          Keys     Name  Count
0      one,ONE      ONE      1
1      ram,Ram      Ram      1
2        kumar    KUmar      1
3      Raj,rAj      raj      1
4      cricket  CRICKEt      1
5  level,LeVel    level      1
6      kum,num  num,KUm      2

edited Nov 04 '17 at 16:29

answered Nov 04 '17 at 15:14

jezrael

822,522
95
1,334
1,252

good, now I want to do one more operation, I will edit the question – Pyd Nov 04 '17 at 15:15
no ,the first solution is not working, its converting to lower case but still `df["keys"]=one, one` – Pyd Nov 04 '17 at 15:17
I tried both your solution but still I am having `df["keys"]=one,one` – Pyd Nov 04 '17 at 15:24
no, we dont need tocare about spaces, we care about comma(,) – Pyd Nov 04 '17 at 15:28
1

The first solution is works fine, @jezrael . Now we have one more thing to do ill update the question ok ? – Pyd Nov 04 '17 at 15:30
we are supposed to have `ONE` not `ONE,ONE` – Pyd Nov 04 '17 at 16:09
your code works fine, but you posted wrong output. Thanks for the solution – Pyd Nov 04 '17 at 16:14
now how to calculate the length of df[name].split(",") I did, `df["count"]=len(df["pre-keys"].str.split(","))` but getting 4 as length – Pyd Nov 04 '17 at 16:26
I know this method, is there any other way to find the length as per your previous solution with only using df["Name"] – Pyd Nov 04 '17 at 16:31
Yes, but it is a little hack `df['Count'] = df['Name'].str.count(',') + 1` – jezrael Nov 04 '17 at 16:34
1

Thank you @jezrael. I will apply it on my main data and see – Pyd Nov 04 '17 at 16:43
Hi @jezrael, can you look into this once https://stackoverflow.com/questions/47131193/value-matching-between-two-dataframes-using-pandas-in-python – Pyd Nov 06 '17 at 07:05

how to apply set and ignorecase in a single datacolumn in pandas

1 Answers1