0

I am writing a function to expand the word contraction. It takes a dataframe as input parameter and outputs the dataframe with "clean_text" column with the expanded pattern in the text. I can do this by using qdap mgsub function to replace the patterns in the texts. However, I am wondering if there is a better solution for this.

contrap_pattern <- c("i'm","you're","he's","she's","it's", "we're", "they're","i've","you've","we've","they've","i'd","you'd","he'd","she'd","we'd","they'd","i'll","you'll","he'll","she'll","we'll","they'll","isn't","aren't","wasn't","weren't","hasn't","haven't","hadn't","doesn't","don't","didn't","won't","wouldn't","shan't","shouldn't","can't","couldn't","mustn't","let's","that's","who's","what's","here's","there's","when's","where's","why's","how's")


replacement_pattern <- c("I am","you are","he is" ,"she is" ,"it is","we are" , "they are", "I have","you have","we have", "they have","I would","you would","he would",  "she would","we would","they would", "I will","you will","he will", "she will" ,"we will","they will","is not","are not","was not","were not","has not" , "have not","had not","does not","do not", "did not" ,"will not","would not", "shall not","should not","can not","could not","must not","let us","that is", "who is","what is","here is", "there is","when is","where is","why is","how is")


clean$text_clean <- qdap::mgsub(pattern = contrap_pattern, replacement = replacement_pattern, clean$text_clean)

Update: without explicitly writing the patterns in the code, the function replace_contraction() serves the need. Thanks @phiver for the suggestion.

Dutt
  • 301
  • 1
  • 9
  • Can you show the `clean$texxt_clean` so that it can be tested – akrun Jan 14 '21 at 16:55
  • 1
    you are looking for `textclean::replace_contraction` – phiver Jan 14 '21 at 16:56
  • 1
    What do you mean by "better solution"? Is this working, but not fast enough? Or you don't like having to maintain the list of patterns and replacements? – Gregor Thomas Jan 14 '21 at 16:59
  • I wanted to find a better solution as I didn't want to write all the patterns in the code. @phiver Thank you for the suggestion. The "replace_contraction" is excellent and what I was looking for. – Dutt Jan 14 '21 at 17:09

0 Answers0