I have a dataframe with a column of company names. I want to create a new column that is a fuzzy/canonicalized version of the name (perhaps using regex to strip suffixes like "corporation, "inc", and "llc" and prefixes like "the").
name <- c("Microsoft", "Apple, Inc.", "Youtube, LLC", "Huffington Post")
companies <- data.frame(name)
I want company$canonicalized_name to return
"microsoft", "apple", "youtube", "huffington post"
How can I write this regex pattern in R?