2

I have a table in Excel with numeric, date, and character type columns. I use the read_excel() function from readxl library to load data into R. For most of the columns, read_excel by default does a good job in recognizing the column type.

Problem:

As the number of columns in the table can increase or decrease, I don't want to define col_types in read_excel to load data.

Two Excel numeric columns are cost and revenue with '$' in front of the value such as $200.0541. The dollar sign '$' seems to cause the function to mistakenly identify the cost and revenue column as POSIXct type.

Since new numeric columns might be added later with '$', is it possible to change the column types after loading the data (without using df$cost <- as.numeric(df$cost) for each column) through a loop?

Edit: link to sample - https://ethercalc.org/ogiqi9s51o45

JohnR
  • 57
  • 7
  • When I have messy data types, rather than relying on my reading functions to get the types right, I'll often read everything as text, then batch convert (such as `dplyr::mutate_at`) to date, numeric, etc as needed. However you do it, you most likely won't need a loop – camille Feb 26 '20 at 13:55
  • But then wouldn't you convert the columns manually? if there are many columns, it becomes time consuming to do so and for each new column, you have to change the type again. Please correct me if i'm mistaken. thanks – JohnR Feb 26 '20 at 13:58
  • Batch them. For example, assuming all your dates are in the same format, `dplyr::mutate_at` would let you convert all the date columns to date data types – camille Feb 26 '20 at 14:00
  • For that i would need to either know all the column names or the column numbers to batch them with `dplyr::mutate_at`. My dataframe has over 60 columns with possibility for more to be added. this doesn't seem efficient. Please let me know if i have not correctly understood the function. – JohnR Feb 26 '20 at 14:10
  • 1
    If you're adding new columns, you'd be handling data types then. It's probably not efficient, but if you've got columns being read incorrectly (fairly common coming from Excel), I don't know that there's anything more efficient and accurate. Maybe you can make an example, because right now I'm just tossing out guesses – camille Feb 26 '20 at 14:36
  • @camille the sample is added. – JohnR Feb 29 '20 at 20:18

0 Answers0