I'm trying to read in a small (17kb), simple csv file from EdX.org (for an online course), and I've never had this trouble with readr::read_csv()
before. Base-R read.csv()
reads the file without generating the problem.
A small (17kb) csv file from EdX.org
library(tidyverse)
df <- read_csv("https://courses.edx.org/assets/courseware/v1/ccdc87b80d92a9c24de2f04daec5bb58/asset-v1:MITx+15.071x+1T2020+type@asset+block/WHO.csv")
head(df)
Gives this output
#> # A tibble: 6 x 13
#> Country Region Population Under15 Over60 FertilityRate LifeExpectancy
#> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 Afghan… Easte… 29825 47.4 3.82 "\r5.4\r" 60
#> 2 Albania Europe 3162 21.3 14.9 "\r1.75\r" 74
#> 3 Algeria Africa 38482 27.4 7.17 "\r2.83\r" 73
#> 4 Andorra Europe 78 15.2 22.9 <NA> 82
#> 5 Angola Africa 20821 47.6 3.84 "\r6.1\r" 51
#> 6 Antigu… Ameri… 89 26.0 12.4 "\r2.12\r" 75
#> # … with 6 more variables: ChildMortality <dbl>, CellularSubscribers <dbl>,
#> # LiteracyRate <chr>, GNI <chr>, PrimarySchoolEnrollmentMale <chr>,
#> # PrimarySchoolEnrollmentFemale <chr>
You'll notice that the column FertilityRate
has "\r" added to the values. I've downloaded the csv file and cannot find them there.
Base-R read.csv()
reads in the file with no problems, so I'm wondering what the problem is with my usage of the tidyverse read_csv()
.
head(df$FertilityRate)
#> [1] "\r5.4\r" "\r1.75\r" "\r2.83\r" NA "\r6.1\r" "\r2.12\r"
How can I fix my usage of read_csv()
so that: the "\r" strings are not there?
If possible, I'd prefer not to have to individually specify the type of every single column.