1

I am doing text cleaning for my pandas dataframe

This is a string from my description column before punctuation is removed:

['dedicated', 'to', 'support', 'the', 'fast-paced', 'technology', 
'lifestyle', 'needs', 'of', 'today', '’', 's', 'modern', 'society', 
'.', 'gadget', 'mix', 'have', 'the', 'benefit', 'of', '“', 
'efficient', 'life', 'â€', 'tied', 'to', 'the', 'products', 'and', 
'services', 'they', 'provide', '.']

This is how the string look like after i applied the code below:

['dedicated', 'to', 'support', 'the', 'fast-paced', 'technology', 
'lifestyle', 'needs', 'of', 'today', '’', 's', 'modern', 'society', 
'gadget', 'mix', 'have', 'the', 'benefit', 'of', '“', 'efficient', 
'life', 'â€', 'tied', 'to', 'the', 'products', 'and', 'services', 
'they', 'provide']

This is my code:

#removing punctuation
import string
punc=string.punctuation
updated_mall['Cleansed_description']=update_mall['Cleansed_description'].apply(lambdax: [word for word in x if word not in punc])
update_mall.head(105)

This code did remove punctuation except:

words like "Fast-paced","...","restaurant/catering".

Other than that,after punctuation removal and changing to lower casing words like Asia's became 'asia' and 's.

I was told that this only check an entire string if is a punctuation instead of checking every single word in a string for punctuation.

bird
  • 11
  • 3

1 Answers1

1

Can you try the below code using regex

import re

updated_mall['Cleansed_description']=update_mall['Cleansed_description'].apply(lambda x: [re.sub(r'[^\w\d\s]', ' ', word.lower()) for word in x])

update_mall.head(105)
il_raffa
  • 5,090
  • 129
  • 31
  • 36
Manoj biroj
  • 288
  • 1
  • 6