2

Given a sample excel file from this link, I attempted to use readxl::read_excel to read files while skipping headers and renaming with new names.

The code below works just fine for Sheet2 with no empty columns inside.

xlsx_file <- '~/reprex.xlsx'
sheets_to_read <- readxl::excel_sheets(xlsx_file)

col_names <- c("names", 'error', 'log_error', 'value1', 'value2', 'acc_score')

readxl::read_excel(xlsx_file,
           sheet = 'Sheet2',
           skip = 1,
           col_names=col_names)

Out:

  names error log_error value1 value2           acc_score
  <chr> <lgl> <lgl>     <lgl>  <chr>            <chr>    
1 agri  NA    NA        NA     6.32929944992065 right    
2 indus NA    NA        NA     11.21162109375   right    
3 oil   NA    NA        NA     1.45932525119925 right    
4 metal NA    NA        NA     2.10280250811577 right 

But for Sheet1, since there are empty columns inside (columns such as: error, LogError, score, etc.), it raises an Error: Sheet 1 has 7 columns (7 unskipped), but col_names has length 6.

After I searched, this issue may be related to the question from here.

So I wonder how could we modify my code readxl::read_excel(...) to coordinate with the case as in Sheet1, which means include all empty columns inside excel file?

EDIT: if we don't add parameter: skip = 1, it could read all columns read excel file:

> readxl::read_excel(xlsx_file,
+            sheet = 'Sheet2',
+            # skip = 1,
+            # col_names=col_names
+            )
# A tibble: 4 x 7
  name  error LogError `2022-01-31` `2022-01-31 pred` direction score
  <chr> <lgl> <lgl>    <lgl>        <chr>             <chr>     <lgl>
1 agri  NA    NA       NA           6.32929944992065  right     NA   
2 indus NA    NA       NA           11.21162109375    right     NA   
3 oil   NA    NA       NA           1.45932525119925  right     NA   
4 metal NA    NA       NA           2.10280250811577  right     NA 

After skipping the first row, we can see the last empty column missing, so the question is how to prevent this? Any helps or comments will be appreciated.

> readxl::read_excel(xlsx_file,
+            sheet = 'Sheet2',
+            skip = 1,
+            # col_names=col_names
+            )
New names:
* `` -> ...2
* `` -> ...3
* `` -> ...4
# A tibble: 3 x 6
  agri  ...2  ...3  ...4  `6.32929944992065` right
  <chr> <lgl> <lgl> <lgl> <chr>              <chr>
1 indus NA    NA    NA    11.21162109375     right
2 oil   NA    NA    NA    1.45932525119925   right
3 metal NA    NA    NA    2.10280250811577   right

Pass col_type='text':

> readxl::read_excel('reprex.xlsx',
+                    sheet = 'Sheet2',
+                    # skip = 1,
+                    # col_names=col_names,
+                    col_types = 'text'
+                    )
# A tibble: 4 × 7                                                                                                               
  name  error LogError `2022-01-31` `2022-01-31 pred` direction score
  <chr> <chr> <chr>    <chr>        <chr>             <chr>     <chr>
1 agri  NA    NA       NA           6.32929944992065  right     NA   
2 indus NA    NA       NA           11.21162109375    right     NA   
3 oil   NA    NA       NA           1.45932525119925  right     NA   
4 metal NA    NA       NA           2.10280250811577  right     NA 

Reference:

https://readxl.tidyverse.org/reference/read_excel.html

ah bon
  • 9,293
  • 12
  • 65
  • 148
  • I'm surprised there are so many NA columns. If you pass `col_types = "text"`, are they still NA? – wibeasley Feb 17 '22 at 15:16
  • The problems seems raised by last NA column, I've added new result after passing `col_types = "text"`. – ah bon Feb 17 '22 at 15:27
  • Why do you want to skip the header? In other words, why are you not satisfied with your 2nd & 4th call (*e.g.*, `readxl::read_excel(xlsx_file, sheet = 'Sheet2')`)? – wibeasley Feb 17 '22 at 16:33
  • Because I wanna to rename each sheet with `col_names <- c("names", 'error', 'log_error', 'value1', 'value2', 'acc_score')`, so I skip the original headers by setting `skip=1` – ah bon Feb 18 '22 at 01:06

0 Answers0