4

I'm trying to round all numerical values in my data frame.

The issue is that my data frame also includes strings, and not just in any particular column or row. I want to avoid having to code a loop where I go through each individual row-column cell pair and check if the value is numerical before rounding.

Is there a function (or a combination of functions) that will let me achieve this?

So far I've tried round_df() and various lapply() and apply() combinations with lambdas. However, I've only gotten where it rounds based on the first value in the column (i.e. if the first value is numerical, it treats the entire column as numerical and only rounds it).

I've run into problems then where the first value is a string and so the entire column goes un-rounded or vice-versa, in which my code errors because it tries to round a string.

My function is:

library(readxl)
library(knitr)
library(gplots)
library(doBy)
library(dplyr)
library(plyr)
library(printr)   
library(xtable)   
library(gmodels)
library(survival)
library(pander)
library(psych)
library(questionr)
library(DT)
library(data.table)
library(expss)
library(xtable)
options(xtable.floating = FALSE)
options(xtable.timestamp = "")
library(kableExtra)
library(magrittr)
library(Hmisc)
library(forestmangr)
library(summarytools)
library(gmodels)
library(stats)

summaryTable <- function(y, bygroup, digit, 
                         title="", caption_heading="", caption="", freq.tab, y.label="",
                         y.names="", boxplot) {
  if (freq.tab) {
    m = multi.fun(y)
  }
  else if (!missing(bygroup)) {
    m = data.frame(y.label = "")
    m = merge(m, data.frame(describeBy(y, bygroup, mat = T)))
    m = select(m, y.label, n, mean, sd, min, median, max)
  }
  else {
    m = data.frame(y.label = "")
    m = merge(m, data.frame(sumconti(y)))
  }
  if (!freq.tab) {
    m$y.label = y.names
  }
  m = round_df(m, digit, "signif")
  if (freq.tab) {
    colnames(m) = c(y.label, "Frequency", "%")
  }
  else if (missing(freq.tab) | !freq.tab) {
    colnames(m) = c(y.label, "n", "Mean", "Std", "Min", "Median", "Max")
  }
  if (!missing(boxplot)) {
    if (boxplot) {
      attach(m)
      layout(matrix(c(1, 1, 2, 1)), 2, 1)
       
      kable(m, align = "c", "latex", booktabs = T, caption=figTitle(x, title, y.label)) %>% 
        kable_styling(position = 'center', 
                      latex_options = c("striped", "repeat_header", "hold_position")) %>% 
        footnote(general = caption, general_title = caption_heading, footnote_as_chunk = T, 
                 title_format = c("italic", "underline"), threeparttable = T)
      
      boxplot(y ~ bygroup, main = figTitle(y, title, y.label), names = y.names, ylab = title, 
              xlab = y.label, col = c("red", "blue", "orange", "pink", 
                                      "green", "purple", "grey", "yellow"), border = "black", 
              horizontal = F, varwidth = T)
    }
  }
  kable(m, 
        align = "c", 
        "latex", 
        booktabs = T, 
        caption = figTitle(x, title, y.label)) %>% 
    kable_styling(position = 'center', 
                  latex_options = c("striped", "repeat_header", "hold_position")) %>% 
    footnote(general = caption, 
             general_title = caption_heading, 
             footnote_as_chunk = T, 
             title_format = c("italic", "underline"), 
             threeparttable = T)
}


figTitle = function(x, title, y.label) {
  if (y.label != "") {
    paste("Summary of", title, "by", y.label)
  }
  else if (title != "") {
    paste("Summary of", title)
  }
  else {
    paste("")
  }
}
Arthur Yip
  • 5,810
  • 2
  • 31
  • 50
stargirl
  • 129
  • 1
  • 2
  • 12
  • 3
    Please show your dataframe using `str(head(your_dataframe))`, we need to see exactly what types you have for your columns. Columns in dataframes are (usually) just one type so if the first value is a string, the rest of the column is going to be a string. – Marius Jun 19 '19 at 23:28
  • 4
    Try `dplyr::mutate_if(df, is.numeric, round)` – Shree Jun 19 '19 at 23:28
  • Try `rapply(your_data, f = round, classes = "numeric", how = "replace")` – markus Jun 20 '19 at 14:16
  • See also: [Apply log2 transformation only to numeric columns of a data.frame](https://stackoverflow.com/questions/56347434/apply-log2-transformation-only-to-numeric-columns-of-a-data-frame) – markus Jun 20 '19 at 14:17
  • `mutate_if` is superseeded,, says the help. Now they use `mutate(df, across(where(is.numeric), round))` – MartineJ May 23 '21 at 11:11

2 Answers2

3

The question did not include the data so we don't really know what the problem is precisely (please always provide a complete minimal reproducible example) but we have divided the answer into two sections based on two possibilities for what the problem might be and have provided test data for each. No packages are used.

Round numeric only

If the problem is that you have a mix of numeric and character and you only want to round the numeric then here are a few ways.

1) Compute which columns are numeric giving the logical vector ok and then round those. We use the built-in Puromycin dataset as an example. No packages are used.

ok <- sapply(Puromycin, is.numeric)
replace(Puromycin, ok, round(Puromycin[ok], 1))

giving:

   conc rate     state
1   0.0   76   treated
2   0.0   47   treated
3   0.1   97   treated
4   0.1  107   treated
5   0.1  123   treated
6   0.1  139   treated
...etc...

1a) The last line can also be written like this if you don't mind overwriting the input.

Puromycin[ok] <- round(Puromycin[ok], 1)

2) Another approach is to perform the condition in the lapply

Round <- function(x, k) if (is.numeric(x)) round(x, k) else x
replace(Puromycin, TRUE, lapply(Puromycin, Round, 1))

2a) or with overwriting:

Puromycin[] <- lapply(Puromycin, Round, 1)

Round everything

If the problem is that all the columns are supposed to be numeric but some are actually character, although they represent numbers, then.using the indicated data frame as an example, apply type.convert.

# create test data having numeric, character and factor columns but
# all intended to represent numbers
DF <- structure(list(Time = c("0.1", "0.12", "0.3", "0.14", "0.5", 
"0.7"), demand = c(0.83, 1.03, 1.9, 1.6, 1.56, 1.98), Time2 = structure(c(1L, 
2L, 4L, 3L, 5L, 6L), .Label = c("0.1", "0.12", "0.14", "0.3", 
"0.5", "0.7"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))

round(replace(DF, TRUE, lapply(DF, type.convert)), 1)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
3

To add one last possibility to the options above:

Suppose you have character columns which contain also (not only) numbers, but in string format. Then the following approach might help.

library(dplyr)
library(purrr)

# I use the data from above's answer with an additional mixed column
DF <- structure(
  list(
    Time = c("0.1", "0.12", "0.3", "0.14", "0.5",
             "0.7"),
    demand = c(0.83, 1.03, 1.9, 1.6, 1.56, 1.98),
    Mix = c("3.38", "4.403", "a", "5.34", "c", "9.32"),
    Time2 = structure(
      c(1L,
        2L, 4L, 3L, 5L, 6L),
      .Label = c("0.1", "0.12", "0.14", "0.3",
                 "0.5", "0.7"),
      class = "factor"
    )
  ),
  class = "data.frame",
  row.names = c(NA,-6L)
)

TBL <- as_tibble(DF)

# This are the functions we use
round_string_number <- function(x) {
  ifelse(!is.na(as.double(x)),
         as.character(round(as.double(x), digit = 1)),
         x)
}

round_string_factor <- compose(round_string_number, as.character)

# Here the recode is happening
TBL %>%
  mutate_if(is.numeric, ~ round(., digit = 1)) %>% 
  mutate_if(is.factor, round_string_factor) %>% 
  mutate_if(~!is.numeric(.), round_string_number)

This will turn this data

  Time  demand Mix   Time2
  <chr>  <dbl> <chr> <fct>
1 0.1     0.83 3.38  0.1  
2 0.12    1.03 4.403 0.12 
3 0.3     1.9  a     0.3  
4 0.14    1.6  5.34  0.14 
5 0.5     1.56 c     0.5  
6 0.7     1.98 9.32  0.7  

Into this:

  Time  demand Mix   Time2
  <chr>  <dbl> <chr> <chr>
1 0.1      0.8 3.4   0.1  
2 0.1      1   4.4   0.1  
3 0.3      1.9 a     0.3  
4 0.1      1.6 5.3   0.1  
5 0.5      1.6 c     0.5  
6 0.7      2   9.3   0.7 
TimTeaFan
  • 17,549
  • 4
  • 18
  • 39