Using openxlsx startRow parameter to pick first row based on the contents of the file

Question

I know that the startRow parameter in the read.xlsx function in openxlsx allows me to read a file starting at a designated row. I have to load 300 xlsx files, unfortunately, the number of rows that I want to skip varies between the files. I always want the first row to contain the word "CPT" in the second column. Is there a way to set the startRow on a text matching parameter? In the picture below, I would set the startRow to 6, but in other cases its 4 or 3.

score 1 · Accepted Answer · answered Dec 14 '20 at 06:02

Maybe it is good idea to read twice (1st: read a few rows of col1&2 and get startRow).
Note: I supposed the cells you want to skip is on col1 (at least).
And with for, sapply or etc, you can do it with all files.

library(openxlsx)
tmp <- read.xlsx(file, colNames = FALSE, rows = 1:10, cols = 1:2, 
                 skipEmptyRows = FALSE)
st <- min(grep("CPT", tmp[[2]]))
d <- read.xlsx(file, startRow = st)

Using openxlsx startRow parameter to pick first row based on the contents of the file

1 Answers1