2
CompanyName            Desired Output
Abbey Company.Com      abbey company
Manisd Company .com    manisd company
Idely.com              idely

How can i remove .com,while taking care that "com" from company is not effected. I've tried the below code

     stopwords = c("limited"," l.c.", " llc","corporation"," &"," ltd.","llp ",
                      "l.l.c","incorporated","association","s.p.a"," l.p.","l.l.l.p","p.a  ","p.c  ",
                      "chtd  ","chtd.  ","r.l.l.l.p  ","rlllp  ", "the "," lmft", " inc.", ".com")

   file_new1$CompanyName<-gsub(paste0(stopwords,collapse = "|"),"", file_new1$CompanyName)

already refereed to this link

Remove certain words in string from column in dataframe in R

A.Info
  • 107
  • 8

2 Answers2

4

If you have:

CompanyName <- c("Abbey Company.Com", "Manisd Company .com", "Idely.com")

You could try:

gsub(paste0(gsub("\\.","\\\\.",stopwords),collapse = "|"),"",
     tolower(CompanyName))
#[1] "abbey company"   "manisd company " "idely"
nicola
  • 24,005
  • 3
  • 35
  • 56
3

You can do gsub("\\.Com","",dt$CompanyName). Assuming that your data.table is called dt

UPDATE

Another solution might be to keep only the "stuff" before the dot (".").

So

CompanyName <- data.table(V1=c("Abbey Company.Com", "Manisd Company .com", "Idely.com"))

> CompanyName
                    V1
1:   Abbey Company.Com
2: Manisd Company .com
3:           Idely.com

CompanyName$V1 <- sel_strsplit(CompanyName$V1,"\\.",1)
> CompanyName
                V1
1:   Abbey Company
2: Manisd Company 
3:           Idely

That way you don't have to care if you have ".com", or ".COM", or ".co.uk" etc

quant
  • 4,062
  • 5
  • 29
  • 70
  • this is what i'll have to do for different formate of "com,.com,.Com, Inc.,inc,inc." Any other shorter way? – A.Info Jun 13 '17 at 12:39