It's a bit tricky to provide a reproducible example, but my issue is that I am bringing in a dataset of British Geological Society mineral production data (after some slight tidying in Excel).
Specifically, from this website I'm downloading a .xlsx of production data for the diamond commodity from 2010 - 2020 by all countries.
When I bring it into R via readxl, the first column name has random characters appended to it -- so instead of Country
, it reads \r\n\tCountry
.
From poking around at other answers, I feel like this could be some strange SQL artifact in the original dataset (?), and I read that you could set readxl with specific encoding to prevent this, but it seems like readxl doesn't take that argument anymore.
My code:
library(tidyverse)
library(readxl)
# Set working directory.--------------------------------------------------------
setwd("Filepath/Project")
# Load data.--------------------------------------------------------------------
df <- read_xlsx("Filepath/Diamond Datasheet.xlsx")
And the head(df) output:
# A tibble: 6 x 12
`\r\n\tCountry` `2010` `2011` `2012` `2013` `2014` `2015` `2016` `2017` `2018` `2019` `2020`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Angola 8362139 8328519 8330996 8601696 8791340 9018942 9021767 9438802 8.41e6 9.15e6 7.73e6
2 Australia 9997752 7561487 8625996 11481749 9288118 13560795 13958000 17135000 1.40e7 1.22e7 9.98e6
3 Botswana 22019000 22903000 20478000 22597000 24658000 20824000 20954000 22900000 2.74e7 2.37e7 1.69e7
4 Brazil 25394 45536 46292 49166 56923 31826 183500 254896 2.51e5 1.66e5 1.25e5
5 Cameroon 6000 6000 5000 5000 6000 4500 3000 3500 3.6 e3 3.5 e3 4.2 e3
6 Canada 11773000 10795000 10529215 10561600 12082000 11677472 11103500 23198761 2.28e7 1.85e7 1.50e7
The \r\n\tCountry
bit is the weird part to me. Many thanks for any advice.