I have a workbook sheet that shows the first 3 rows with data when opened with LibreOffice Calc. If I use conditional formatting to set cell background color to red if a cell is blank, all cells in rows 4 and following show red. When I read the spreadsheet with Pandas, I get 20 rows with rows 4 through 20 all blank. When I read the spreadsheet with openpyxl, I also get 20 rows and here is the interesting part: rows 4 through 20 have values in column AC (29). That column has a data validation drop-down. While no data shows up in LibrOffice Calc or Pandas, data shows up in openpyxl. This appears to be a ghost of data that has been deleted.
I can delete blank rows in Pandas after I read the worksheet but the read_excel method throws up data validation errors for rows 4 through 20 before I can delete them. I would like to detect and remove the rows before read_excel. Is there a way to detect this and remove the spurious rows using openpyxl? I could then use openpyxl.load_workbook, delete the bad rows, and then use read_excel giving it the openpyxl workbook.