0

I know that the startRow parameter in the read.xlsx function in openxlsx allows me to read a file starting at a designated row. I have to load 300 xlsx files, unfortunately, the number of rows that I want to skip varies between the files. I always want the first row to contain the word "CPT" in the second column. Is there a way to set the startRow on a text matching parameter? In the picture below, I would set the startRow to 6, but in other cases its 4 or 3.

enter image description here

MatthewR
  • 2,660
  • 5
  • 26
  • 37

1 Answers1

1

Maybe it is good idea to read twice (1st: read a few rows of col1&2 and get startRow).
Note: I supposed the cells you want to skip is on col1 (at least).
And with for, sapply or etc, you can do it with all files.

library(openxlsx)
tmp <- read.xlsx(file, colNames = FALSE, rows = 1:10, cols = 1:2, 
                 skipEmptyRows = FALSE)
st <- min(grep("CPT", tmp[[2]]))
d <- read.xlsx(file, startRow = st)
cuttlefish44
  • 6,586
  • 2
  • 17
  • 34