2

I am trying to read a dataframe from an Excel file using the read_excel() function from the Tidyverse (ReadXl) package in R. I would like to read a column as.factor, however, the col_types argument in the read_excel() function does not seem to support as_factor.

I now use the following workaround:

Sample code:

library(tidyverse)
library(readxl)

df <– read_excel("name_of_excel_file.xlsx")

df <- df %>% 
     mutate(column = as.factor(column))

Is there an easier (and more direct) way to do this?

camille
  • 16,432
  • 18
  • 38
  • 60
user213544
  • 2,046
  • 3
  • 22
  • 52
  • AFAIK that's the best there is. Any reason why that doesn't work well for you? – camille Feb 12 '20 at 17:10
  • 2
    Not really. `col_types` won't take "factor" as a data type. Though, of course, you could `mutate_at` if you want, which is slightly less redundant. – GenesRus Feb 12 '20 at 18:49
  • @camille It does work well, however, it is an operation that I need to perform quite regularly, so I was wondering if it might be optimized somehow. – user213544 Feb 13 '20 at 09:43
  • Although you don't specify if you must use the `readxl` package, if you do, you need the added step because direct coercing to factors is not allowed (https://readxl.tidyverse.org/articles/cell-and-column-types.html). Character to factor should be a conscious conversion, remember that adding categories after the fact is hard (https://stackoverflow.com/questions/23316815/add-extra-level-to-factors-in-dataframe). – Pablo Adames Oct 26 '21 at 05:34

1 Answers1

2

Reference: http://bradleyboehmke.github.io/tutorials/importing_data

Try:

library(xlsx)

df <- read.xlsx("name_of_excel_file.xlsx", sheetName = "Sheet1",
                             stringsAsFactors = TRUE)
TarJae
  • 72,363
  • 6
  • 19
  • 66