Given a sample excel file from this link, I attempted to use readxl::read_excel
to read files while skipping headers and renaming with new names.
The code below works just fine for Sheet2
with no empty columns inside.
xlsx_file <- '~/reprex.xlsx'
sheets_to_read <- readxl::excel_sheets(xlsx_file)
col_names <- c("names", 'error', 'log_error', 'value1', 'value2', 'acc_score')
readxl::read_excel(xlsx_file,
sheet = 'Sheet2',
skip = 1,
col_names=col_names)
Out:
names error log_error value1 value2 acc_score
<chr> <lgl> <lgl> <lgl> <chr> <chr>
1 agri NA NA NA 6.32929944992065 right
2 indus NA NA NA 11.21162109375 right
3 oil NA NA NA 1.45932525119925 right
4 metal NA NA NA 2.10280250811577 right
But for Sheet1
, since there are empty columns inside (columns such as: error, LogError, score, etc.), it raises an Error: Sheet 1 has 7 columns (7 unskipped), but col_names has length 6.
After I searched, this issue may be related to the question from here.
So I wonder how could we modify my code readxl::read_excel(...)
to coordinate with the case as in Sheet1
, which means include all empty columns inside excel file?
EDIT: if we don't add parameter: skip = 1
, it could read all columns read excel file:
> readxl::read_excel(xlsx_file,
+ sheet = 'Sheet2',
+ # skip = 1,
+ # col_names=col_names
+ )
# A tibble: 4 x 7
name error LogError `2022-01-31` `2022-01-31 pred` direction score
<chr> <lgl> <lgl> <lgl> <chr> <chr> <lgl>
1 agri NA NA NA 6.32929944992065 right NA
2 indus NA NA NA 11.21162109375 right NA
3 oil NA NA NA 1.45932525119925 right NA
4 metal NA NA NA 2.10280250811577 right NA
After skipping the first row, we can see the last empty column missing, so the question is how to prevent this? Any helps or comments will be appreciated.
> readxl::read_excel(xlsx_file,
+ sheet = 'Sheet2',
+ skip = 1,
+ # col_names=col_names
+ )
New names:
* `` -> ...2
* `` -> ...3
* `` -> ...4
# A tibble: 3 x 6
agri ...2 ...3 ...4 `6.32929944992065` right
<chr> <lgl> <lgl> <lgl> <chr> <chr>
1 indus NA NA NA 11.21162109375 right
2 oil NA NA NA 1.45932525119925 right
3 metal NA NA NA 2.10280250811577 right
Pass col_type='text'
:
> readxl::read_excel('reprex.xlsx',
+ sheet = 'Sheet2',
+ # skip = 1,
+ # col_names=col_names,
+ col_types = 'text'
+ )
# A tibble: 4 × 7
name error LogError `2022-01-31` `2022-01-31 pred` direction score
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 agri NA NA NA 6.32929944992065 right NA
2 indus NA NA NA 11.21162109375 right NA
3 oil NA NA NA 1.45932525119925 right NA
4 metal NA NA NA 2.10280250811577 right NA
Reference: