1

Using a user-defined function I have to join the lower and higher bound of confidence intervals (named as CIlow and CIhigh) of a selected number of columns from a data frame. The data frame has CIlow and CIhigh for a number of groups (named as a, b and c) and for a number row (in this example just two). See below how the data frame looks like.

dataframe<-data.frame(CIlow_a=c(1.1,1.2),CIlow_b=c(2.1,2.2),CIlow_c=c(3.1,3.2),
CIhigh_a=c(1.3,1.4),CIhigh_b=c(2.3,2.4),CIhigh_c=c(3.3,3.4))

I would like to have a joined column for each group in a selected number of groups (e.g. a, b) among the existing ones (a, b and c).

Thus, the expected output should be the following:

output<-data.frame(CI_a=c("(1.1,1.3)","(1.2,1.4)"),
                  CI_b=c("(2.1,2.3)","(2.2,2.4)"))

To built my own user-defined function I tried the following code:

f<-function(df,gr){

enquo_gr<-enquo(gr)

r<-df%>%
   dplyr::mutate(UQ(paste("CI",quo_name(gr),sep="_")):=
                   sprintf("(%s,%s)",
                           paste("CIlow",UQ(enquo_gr),sep="_"),
                           paste("CIhigh",UQ(enquo_gr),sep="_")))%>%
   dplyr::select(paste("CI",UQ(enquo_gr),sep="_"))

return(r)
}

However when using the above mentioned function in this way

library(dplyr)
group<-c("a","b")
dataframe<-data.frame(CIlow_a=c(1.1,1.2),CIlow_b=c(2.1,2.2),CIlow_c=c(3.1,3.2),CIhigh_a=c(1.3,1.4),CIhigh_b=c(2.3,2.4),CIhigh_c=c(3.3,3.4))

f(df=dataframe,gr=group)

I get the following error message:

Error: expr must quote a symbol, scalar, or call

How could I solve this issue?

PS1: This question is similar to a previous one. However, this question goes one step further because it requires selecting the columns to be merged.

PS2: I would appreciate code suggestions following the approach of this question.

ungatoverde
  • 161
  • 12

3 Answers3

1

If we are passing quoted strings, then use sym (for more than one element - syms which return a list)

f <- function(df, gr){
   sl <-  rlang::syms(paste("CIlow", gr, sep="_"))
   sh <-  rlang::syms(paste("CIhigh", gr, sep="_"))
   nmN <- paste("CI", gr, sep= "_")


   df %>%
       dplyr::mutate(!!(nmN[1]) := sprintf("(%s,%s)",
                               !!(sl[[1]]), !!(sh[[1]])),
                     !!(nmN[2]) := sprintf("(%s,%s)",
                               !!(sl[[2]]), !!(sh[[2]]))) %>%
       dplyr::select(paste("CI", gr, sep="_"))



 }

group <- c("a","b")
f(dataframe, group)
#      CI_a      CI_b
#1 (1.1,1.3) (2.1,2.3)
#2 (1.2,1.4) (2.2,2.4)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks @akrun! Great solution! However, I need a script that enables an undefined number of groups. I mean, I would like to join CI sometimes for `a`, sometimes for `a` and `b`, and sometimes (why not) for `a`, `b` and `c`. You script fixes the number of groups to 2 (see `[[1]]` and `[[2]]`). I would appreciate any further idea. – ungatoverde Jul 10 '17 at 06:52
  • I tried this but it did not work ("LHS must be a name or string) f <- function(df, gr){ sl <- rlang::syms(paste("CIlow", gr, sep="_")) sh <- rlang::syms(paste("CIhigh", gr, sep="_")) nmN <- paste("CI", gr, sep= "_") df %>% dplyr::mutate(!!(nmN) := sprintf("(%s,%s)", !!(sl), !!(sh)))%>% dplyr::select(paste("CI", gr, sep="_")) }` – ungatoverde Jul 10 '17 at 07:09
  • @ungatoverde I tried earlier a similar approach, but it may not work. I will check when I get some time – akrun Jul 10 '17 at 07:20
1

I would have probably answered differently basing on the question, but after examining you answer I prepared below code. It uses trick with lapply from here dplyr::unite across column patterns. I am not sure if usage of dplyr/tidyr is the best option here, maybe simple for would be simpler.

output <- data.frame(CI_a=c("(1.1,1.3)","(1.2,1.4)"),
                     CI_b=c("(2.1,2.3)","(2.2,2.4)"),
                     stringsAsFactors = F)

dataframe <- data.frame(CIlow_a=c(1.1,1.2),CIlow_b=c(2.1,2.2),CIlow_c=c(3.1,3.2),
                        CIhigh_a=c(1.3,1.4),CIhigh_b=c(2.3,2.4),CIhigh_c=c(3.3,3.4))


tricky <- function(input_data, group_ids){

  # convert columns to character

  input_data <- input_data %>%
    mutate_each(funs(as.character(.)))

  # unite selected groups

  output <- group_ids %>%
    lapply(function(group_id) {unite_(input_data, 
                                      paste0("CI_", group_id), 
                                      paste0(c("CIlow_", "CIhigh_"), group_id), 
                                      sep = ',') %>% select_(paste0("CI_", group_id))}) %>%
    bind_cols() %>%
    mutate_each(funs(paste0("(", ., ")")))

  return(output)

}

identical(tricky(dataframe, list("a", "b")), output)
RST
  • 11
  • 1
0

I have found by myself an solution for my issue. The code below works:

output<-data.frame(CI_a=c("(1.1,1.3)","(1.2,1.4)"), CI_b=c("(2.1,2.3)","(2.2,2.4)"))

dataframe<-data.frame(CIlow_a=c(1.1,1.2),CIlow_b=c(2.1,2.2),CIlow_c=c(3.1,3.2),
                      CIhigh_a=c(1.3,1.4),CIhigh_b=c(2.3,2.4),CIhigh_c=c(3.3,3.4))

f <- function(df, gr){

   sl <<-  rlang::syms(paste("CIlow", gr, sep="_"))
   sh <<-  rlang::syms(paste("CIhigh", gr, sep="_"))
   nmN <<- paste("CI", gr, sep= "_")
   r<-df

for(i in 1:length(gr)){
        r<-dplyr::mutate(r,UQ(nmN[i]) := sprintf("(%s;%s)", UQ(sl[[i]]),UQ(sh[[i]])))
}
   r<- dplyr::select(r,nmN)
return(r)

 }

group <- c("a","b")

x<-f(df=dataframe, gr=group)

The code works for an undefined number of elements in group. Thus, it works for c("a","b"), for c("a") or c("a","b","c").

I know loops are not recommended. Any better solution is appreciated.

ungatoverde
  • 161
  • 12