I need to transfer some pdf table content to Excel. I used the PyMuPDF module to be able to put the PDF content to a .txt file. In which it is easier to work with and I did it successfully.
As you can see in the .txt file I was able to transfer each column and row of the pdf. They are displayed sequentially.
- I need some way to read the txt strings sequentially so I can put each line of the txt into a .xlsx cell.
- Some way to setup triggers to start reading the document sequentially and lines to throw away. Example: Start reading after a specific word, stop reading when some word is reached. Things like this. Because these documents have headers and unuseful information that are also transcript to the txt file. So I need to ignore some contents of the txt to gather only the useful information to put in the .xlsx cells.
*I'm using the xlrd library, I would like to know how I can work with things here. (optional)
I don't know if it is a problem, but when I use the count method to count the number of lines, it returned only 15 lines. The document has 568 lines in total. It only showed the full empty ones.
with open(nome_arquivo_nota, 'r'):
for line in nome_arquivo_nota:
count += 1
print(count)
= 15 .