I was looking for a intuitive solution for a problem of mine. I have a huge list of words, in which i have to insert a special character based on some criteria. So if a two/three letter word appear in a cell i want to add "+" right and left to it
Example
global b2b banking
would transform to global +b2b+ banking
how to finance commercial ale estate
would transform to how +to+ finance commercial +ale+ estate
Here is sample data set:
sample <- c("commercial funding",
"global b2b banking"
"how to finance commercial ale estate"
"opening a commercial account",
"international currency account",
"miami imports banking",
"hsbc supply chain financing",
"international business expansion",
"grow business in Us banking",
"commercial trade Asia Pacific",
"business line of credits hsbc",
"Britain commercial banking",
"fx settlement hsbc",
"W Hotels")
data <- data.frame(sample)
Additionally is it possible to drop a row which has a character of length 1 ? Example:
W Hotels
For all the one letter word i tried removing them with gsub,
gsub(" *\\b[[:alpha:]]{1,1}\\b *", " ", sample)
This should be removed from the data set set.
Any help is highly appreciated.
Edit 1
Thanks for the help, I added few more lines to it:
sample <- c("commercial funding", "global b2b banking", "how to finance commercial ale estate", "opening a commercial account","international currency account","miami imports banking","hsbc supply chain financing","international business expansion","grow business in Us banking", "commercial trade Asia Pacific","business line of credits hsbc","Britain commercial banking","fx settlement hsbc", "W Hotels")
sample <- sample[!grepl("\\b[[:alpha:]]\\b",sample)]
sample <- gsub("\\b([[:alpha:][:digit:]]{2,3})\\b", "+\\1+", sample)
sample <- gsub(" ",",",sample)
sample <- gsub("+,","+",sample)
sample <- gsub(",+","+",sample)
sample <- tolower(sample)
sample <- ifelse(substr(sample, 1, 1) == "+", sub("^.", "", sample), sample)
data <- data.frame(sample)
data
sample
1 commercial++funding
2 global+++b2b+++banking
3 how++++to+++finance++commercial+++ale+++estate
4 international++currency++account
5 miami++imports++banking
6 hsbc++supply++chain++financing
7 international++business++expansion
8 grow++business+++in++++us+++banking
9 commercial++trade++asia++pacific
10 business++line+++of+++credits++hsbc
11 britain++commercial++banking
12 fx+++settlement++hsbc
Somehow i am unable to remove "+," with "," with gsub ? what am i doing wrong ?
so "fx+,settlement,hsbc"
should be "fx+settlement,hsbc"
but it is replacing , wth additional ++.