2

I have a column as follows.

 Id      Feedback
 1        c("No", "No", "No", "No", "No", "No")
 2        c("No", "No", "No")
 3        c("No", "No", "No", "No", "Taking Medication")

I am trying to get rid of the No such that the final results after cleanup should look like this

 Id      Feedback
 1        
 2        
 3        "Taking Medication"

I tried using function sub it did not work. I tied using gsub the function worked but the results are messy. When I use df1$Feedback = gsub("No", "", df1$Feedback) the results are like this below

 Id      Feedback
 1        c("", "", "", "", "", "")
 2        c("", "", "")
 3        c("", "", "", "", "Taking Medication")

Any help regarding this issue is much appreciated.

  • How did you created the `Feedback` column. Could you show the `dput(droplevels(head(yourdata,3)))`? – akrun Sep 14 '15 at 04:26
  • Without knowing the structure, it is difficult to suggest a solution. Perhaps `library(stringr);str_extract(df1$Feedback, 'Taking Medication')#[1] NA NA "Taking Medication"` – akrun Sep 14 '15 at 04:32
  • @akrun there's a sample of my dataset `test1 = structure (list(iD =c(1L), Feedback = c("c(\"No\", \"No\",\"No\", \"No\",\"No\", \"No\",)"))) test1 = as.data.frame(test1)` – legrand latoya Sep 14 '15 at 04:45

2 Answers2

2

We split the 'Feedback' column by 'No' or (|) quote ("). The output is a list. We loop through the list with vapply, grep to get the numeric index of elements that are alphabets or space from start to end of the string (^[A-Za-z ]$). We create a logical condition, i.e. if the length of the index is greater than 0, we return the element corresponding to 'x1' or else return NA.

df1$Feedback <-  vapply(strsplit(df1$Feedback, 'No|"'), function(x) {
                        x1 <- grep('^[A-Za-z ]+$', x)
                        if(length(x1)>0) x[x1]
                        else ''}, character(1)) 
df1
#  Id          Feedback
#1  1                  
#2  2                  
#3  3 Taking Medication

Or another option is gsub. We match the substring 'No' or (|) double quotes, comma, parentheses ([",()]) or (|) letter 'c' followed by parentheses (c(?:\\()) and replace it with ''. The leading/lagging spaces can be removed using a second gsub.

gsub('^\\s*|\\s*$', '', 
    gsub('No|[",()]|c(?:\\()', '', df1$Feedback, perl=TRUE))
#[1] ""                  ""                  "Taking Medication"

data

df1 <- structure(list(Id = 1:3, 
Feedback = c("c(\"No\", \"No\", \"No\", \"No\", \"No\", \"No\")", 
"c(\"No\", \"No\", \"No\")", "c(\"No\", \"No\", \"No\", \"No\", \"Taking Medication\")"
)), .Names = c("Id", "Feedback"), class = "data.frame", 
row.names = c(NA, -3L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @legrandlatoya Thanks for the feedback and encouraging words. Glad to know that it works. – akrun Sep 14 '15 at 05:23
-2
library(dplyr)
library(tidyr) 

your_data_frame %>%
  group_by(Id) %>%
  do(.$Feedback %>% 
           parse(text = .) %>% 
           eval %>%
           {data_frame(Feedback = .)}) %>%
  filter(Feedback != "No")
bramtayl
  • 4,004
  • 2
  • 11
  • 18