1

I am trying to read a file in which a column generally contains a double but when the value is neutral it has "Calm" as its value. So when I read the file I am using na = "Calm | calm" and inside col_types = cols() I am parsing the column using '9am wind speed (km/h)' = col_double() below is my function for making tibbles.

generate_tibble <- function(filename) {
    temp_tibble <- read_csv(
            paste0("./data/", filename),
            na = c('calm',"Calm"),
            skip = 7,
            col_types = cols(
               'Date' = col_date(format = "%d/%m/%Y"),
               'Evaporation (mm)' = col_double(),
               'Sunshine (hours)' = col_double(),
               '9am wind speed (km/h)' = col_double()
               )
            )
    }

after that, I read my first file like this: main_df <- generate_tibble(file_names[1]) with which I am going to merge other files using the column names. So I run the loop using the following code.

for (i in file_names[2:length(file_names)]) {
  temp <- generate_tibble(i)
  main_df <- rbind(main_df, temp)
  print(paste("FINISHED PARSING:", i))
}

but when I run the loop I get errors like this:

the error after the loop

when I run problems(main_df) it show this message: message after running problems()

what should I do to fix this issue? thanks in advance.

  • 1
    Based on what's showing up in `problems(main_df)` you could manually check what is in (for example) row 2 column 5. Looks like it's "" which is not a double. You could add that to `na = `. – Harrison Jones Sep 27 '21 at 14:51
  • Side note, your loop isn't allocating memory efficiently. https://evodify.com/r-loops-are-slow/ – Harrison Jones Sep 27 '21 at 14:52

1 Answers1

0

Instead of trying to Type set and process inside of the read_csv you can try doing the formatting after the file has been read. Here is a tidy version of your function

generate_tibble <- function(filename) {

 temp_tibble <- read_csv(file.path(".","data","filename")) %>%
                mutate('Date' = col_date(format = "%d/%m/%Y"),
               'Evaporation (mm)' = col_double(),
               'Sunshine (hours)' = col_double(),
               '9am wind speed (km/h)' = col_double())

}

All string values should be turned into NA when you force the column to be numeric. So you should not need to specify "Calm". The file.path() function is made for file paths so that your code can be used by both Linux and Windows users.

After that I recommend using an lapply() function instead of a loop and binding the product of the lapply using data.table::rbindlist()

https://www.datacamp.com/community/tutorials/r-tutorial-apply-family?utm_source=adwords_ppc&utm_campaignid=12492439679&utm_adgroupid=122563408041&utm_device=c&utm_keyword=apply%20family%20r&utm_matchtype=b&utm_network=g&utm_adpostion=&utm_creative=504158803141&utm_targetid=aud-522010995285:kwd-614516587376&utm_loc_interest_ms=&utm_loc_physical_ms=9028709&gclid=Cj0KCQjw18WKBhCUARIsAFiW7JzR0Avzz054y2tx2b1Sx7hZOEGHMRxfNdmgodVnzh9fqNUyg5JsAz4aAvvyEALw_wcB#codelapplycode

SKyJim
  • 111
  • 7