0

How to to add comma after usernames in my string, so that I can eliminate words before comma in order to get uniform string which I can use for exact match.

 a=dataframe(text=c("hi john what are you doing",
                "hi sunil what are you doing",
                "hello sanjay what are you doing"),stringsAsFactors =FALSE)
Sand
  • 115
  • 1
  • 10
  • Do you have a list of names or vector of names – akrun Mar 28 '19 at 12:12
  • Problem is , input file is in lowercase...so its hard to distinguish names. Also, suggest if there is a way to convert user names to caps so that we can remove them later. – Sand Mar 28 '19 at 12:12
  • U would need a specific pattern, which the names have to fall into. Otherwise this will be impossible to do // EDIT: If all entries are strucured like this, you could just use the second word as reference for the Username. – Julian_Hn Mar 28 '19 at 12:12
  • Hi akrun, I don't have a list of names, as its a big file – Sand Mar 28 '19 at 12:12
  • if there is no pattern, then it becomes difficult – akrun Mar 28 '19 at 12:18
  • You may be able to get a list of names from the answers provided in https://stackoverflow.com/questions/18391799/database-or-list-of-english-first-and-last-names – Kerry Jackson Mar 28 '19 at 12:23

2 Answers2

1

If you know that the username is at second position in the sentence you can extract the sentences from DF and use this:

text=c("hi john what are you doing",
                "hi sunil what are you doing",
                "hello sanjay what are you doing")

for (sentence in text) {
  #separate words in sentence
  spl <- strsplit(sentence," ")
  #extract name and converto to uppercase
  name <- toupper(as.character(spl[[1]])[2])
  #put a comma after name
  name2 <- paste(name, ",", sep="")
  #replace original name with new one
  spl[[1]][2] <- name2
  #loop over the sentence words to recretae the sentence
  for ( i in 1:length(spl[[1]])-1 ) {
    if (i == 1) sentence2 <- paste(spl[[1]][i], spl[[1]][i+1])
    else sentence2 <- paste(sentence2, spl[[1]][i+1])
    }
    #put in new list (text2)
    if (sentence == text[1]) text2 <- c(sentence2)
    else text2 <- append( text2, sentence2 )
  }

result:

#text2
#[1] "hi JOHN, what are you doing"      "hi SUNIL, what are you doing"    
#[3] "hello SANJAY, what are you doing"

and then recreate the data frame.

Otherwise, if your username position in the sentence can vary but you have a list of usernames that you need to find you can also check if at least one is found, take the position of the username in the sentence, replace, put comma and then recreate, or printing an error if not found.

usernames <- c("john", "sunil", "sanjay")

text=c("hi john what are you doing",
                "hi sunil what are you doing",
                "hello sanjay what are you doing",
                "hello ciao how are you"
              )


for (sentence in text) {

  user_present <- NA

  #separate words in sentence
  spl <- strsplit(sentence," ")

  #check if a user is present in the sentence
  for (user in usernames) {
    if ( user %in% spl[[1]]) {
      user_present <- user
      break
    }}

  #if at least one user is found
  if ( !is.na(user_present) ) {
    pos <-   which( spl[[1]] == user_present )
    #extract name and converto to uppercase
    name <- toupper(as.character(spl[[1]])[pos])
    #put a comma after name
    name2 <- paste(name, ",", sep="")
    #replace original name with new one
    spl[[1]][2] <- name2
    #loop over the sentence words to recretae the sentence
    for ( i in 1:length(spl[[1]])-1 ) {
      if (i == 0) sentence2 <- paste(spl[[1]][i], spl[[1]][i+1])
      else sentence2 <- paste(sentence2, spl[[1]][i+1])
      }
      #put in new list (text2)
      if (sentence == text[1]) text2 <- c(sentence2)
      else text2 <- append( text2, sentence2 )
  #if NO username in sentence
  } else {
    #print error message with username and sentence in which not found
    err.msg <- paste("NO username found in sentence: ", sentence)
    print(err.msg)
  }
}

result:

#[1] "NO username found in sentence:  hello ciao how are you"

text2
#[1] " hi JOHN, what are you doing"      " hi SUNIL, what are you doing"    
#[3] " hello SANJAY, what are you doing"

Hope it helps!

###END
cccnrc
  • 1,195
  • 11
  • 27
  • Error in paste(sentence2, spl[[1]][i + 1]) : object 'sentence2' not found for first code (if usename is second) – Sand Mar 29 '19 at 04:32
0

Two ideas to solve this.

First, if you could get a list with usernames.

usernames <- c("john", "sunil", "sanjay")
diag(sapply(usernames, function(x) gsub(x, paste0(x, ","), a$text)))
# [1] "hi john, what are you doing"      "hi sunil, what are you doing"     "hello sanjay, what are you doing"

Or, if username is always the second word.

gsub("(^\\w*\\s)(\\w*)", "\\1\\2,", a$text)
# [1] "hi john, what are you doing"      "hi sunil, what are you doing"     "hello sanjay, what are you doing"

Data

a <- structure(list(text = c("hi john what are you doing", "hi sunil what are you doing", 
"hello sanjay what are you doing")), class = "data.frame", row.names = c(NA, 
-3L))
jay.sf
  • 60,139
  • 8
  • 53
  • 110