12

I'm using xtable to generate tables to put in Latex, and was wondering if there's a way to have conditional formatting of cells so that all significant p-values are in grey? I'm using Knitr in TexShop.

Here's an example using the diamonds data in ggplot2, and running a TukeyHSD test to predict carat from cut.

library(ggplot2)
library(xtable)
summary(data.aov <- aov(carat~cut, data = diamonds))
data.hsd<-TukeyHSD(data.aov)
data.hsd.result<-data.frame(data.hsd$cut)
data.hsd.result

I can then get data.hsd.result into xtable format with:

xtable(data.hsd.result)

In Latex, the output looks like this:

                         diff         lwr         upr        p.adj
Good-Fair         -0.19695197 -0.23342631 -0.16047764 0.000000e+00
Very Good-Fair    -0.23975525 -0.27344709 -0.20606342 0.000000e+00
Premium-Fair      -0.15418175 -0.18762721 -0.12073628 0.000000e+00
Ideal-Fair        -0.34329965 -0.37610961 -0.31048970 0.000000e+00
Very Good-Good    -0.04280328 -0.06430194 -0.02130461 5.585171e-07
Premium-Good       0.04277023  0.02165976  0.06388070 3.256208e-07
Ideal-Good        -0.14634768 -0.16643613 -0.12625923 0.000000e+00
Premium-Very Good  0.08557350  0.06974902  0.10139799 0.000000e+00
Ideal-Very Good   -0.10354440 -0.11797729 -0.08911151 0.000000e+00
Ideal-Premium     -0.18911791 -0.20296592 -0.17526989 0.000000e+00

It it possible to have any p-values < 0.05 to have a grey coloured background automatically or highlighted in some way? Obviously, for this set it would be the whole column, but I'm hoping for something that works with all my data.

pnuts
  • 58,317
  • 11
  • 87
  • 139
dmt
  • 2,113
  • 3
  • 24
  • 23

2 Answers2

18

Hello try this :

\documentclass{article}
\usepackage{color}
\begin{document}

<<echo=FALSE, results='asis'>>=
df = data.frame(V1 = LETTERS[1:6], V2 = runif(6, 0, 1))
df$V3 = ifelse(df$V2 < 0.5, paste0("\\colorbox{red}{", df$V2, "}"), df$V2)
library(xtable)
print(xtable(df), sanitize.text.function = function(x) x)
@

\end{document}

EDIT

If you have multiple conditions, one solution is to use package dplyr and function case_when :

set.seed(123)
df <- data.frame(V1 = LETTERS[1:6], V2 = runif(6, 0, 1))

library("dplyr")
df %>% 
  mutate(
    V3 = case_when(
      V2 < 0.5 ~ paste0("\\colorbox{red}{", round(V2, 3), "}"),
      V2 >= 0.5 & V2 < 0.8 ~ paste0("\\colorbox{blue}{", round(V2, 3), "}"),
      TRUE ~ formatC(V2, digits = 3)
    )
  )
#   V1        V2                      V3
# 1  A 0.2875775  \\colorbox{red}{0.288}
# 2  B 0.7883051 \\colorbox{blue}{0.788}
# 3  C 0.4089769  \\colorbox{red}{0.409}
# 4  D 0.8830174                   0.883
# 5  E 0.9404673                    0.94
# 6  F 0.0455565  \\colorbox{red}{0.046}
Victorp
  • 13,636
  • 2
  • 51
  • 55
  • thanks @Victorp, but I notice that the number of decimal digits is much longer on the transformed data. Any hints on how to keep it to the same number of digits of the other columns? – PaoloCrosetto Jun 19 '15 at 08:40
  • @Victorp `df$V2 < 0.5` ;) – Luca Braglia Oct 09 '15 at 08:50
  • @PaoloCrosetto, you can do `round(df$V2, 4L)` inside `ifelse`. Also you can add `options(scipen = 10)` to penalize scientific notation. – akhmed Oct 03 '16 at 08:17
  • what if I want to impose more than on conditions like :`df$V3 = ifelse(df$V2 < 0.5, paste0("\\colorbox{red}{", df$V2, "}"), df$V2)` and `df$V3 = ifelse(df$V2 >= 0.5 & df$V2 <0.8, paste0("\\colorbox{blue}{", df$V2, "}"), df$V2)`, the second line won't work since the first line of code turn df$V3 into "character",on which numerical conditions can't be imposed.Any way to solve this? – Jia Gao Aug 23 '17 at 05:09
  • @Jason Goal you can nest your `ifelse`. See my edit for a more elegant solution. – Victorp Aug 23 '17 at 08:07
  • @Victorp, elegant indeed, I knew this would be very short with some good programmers like you. – Jia Gao Aug 23 '17 at 09:08
1

Victorp provide an excellent solution and it gave me such a relief from a hours long struggle. Then later the day I need to impose more than one conditions on same data set, meaning I need two different colors on cells based on different conditions, to solve this, totally based on Victorp's answer, I figured a solution and hope this would help those need this in the future.

    <<echo=FALSE, results='asis'>>=
    df = data.frame(V1 = LETTERS[1:6], V2 = runif(6, 0, 1),V3 = runif(6, 0, 1))
    ## replicate the data frame of which you are going to highlight the cells
    ## the number of duplicates should be equal to number of conditions you want to impose
    temp.1<-df
    temp.2<-df
    ## impose conditions on those temporary data frame separately.
    ## change the columns you want to 
    for (i in colnames(temp.1)[2:3]) {
    temp.1[,i]= ifelse(temp.1[,i] <= 0.5,
                                paste0("\\colorbox{red}{", temp.1[,i], "}"), temp.1[,i])}
    rm(i)


    for (i in colnames(temp.2)[2]) {
    temp.2[,i]= ifelse(temp.2[,i] > 0.5 & temp.2[,i] <=0.8,
                                paste0("\\colorbox{blue}{", temp.2[,i], "}"),temp.2[,i])}
    rm(i)
    ## then record the position of cells under you conditions
    pos.1<-which(df[,] <=0.5,arr.ind = TRUE)
    pos.2<-which(df[,] >0.5 & df[,]<=0.8,arr.ind = TRUE)
    ## replace cells in original data frame that you want to highlight
    ## replace those values in temp which satisfy the condition imposed on temp.1
    if(length(pos.1)>0) {
      temp[pos.1]<-temp.1[pos.1]
    }


    ## replace those values in temp which satisfy the condition imposed on temp.2
    if(length(pos.2)>0) {
      temp[pos.2]<-temp.2[pos.2]
    }
    rm(temp.1,temp.2,pos.1,pos.2)
    @

then you print df in the way you like. This works, however,given the power of R, I do believe there should be much more easier ways for this.

Jia Gao
  • 1,172
  • 3
  • 13
  • 26