I am trying to parse the pdf found here: https://corporate.lowes.com/sites/lowes-corp/files/annual-report/lowes-2020ar.pdf with python. It seems to be text-based, according to the copy/paste test, and the first several pages parse just fine using, e.g. pymupdf.
However, after about page 12, there seems to be an internal change in the document encoding. For example, this section from page 18:
It looks like text, but when you copy and paste it, it becomes:
%A>&1;<81
FB9#4AH4EL
%BJ8XF8@C?BL874CCEBK<@4G8?L
9H??G<@84FFB6<4G8F4A7
C4EGG<@84FFB6<4G8F
CE<@4E<?L<AG;8.A<G87,G4G8F4A74A474"A9<F64?
J88KC4A787BHEJBE>9BE68
;<E<A:4FFB6<4G8F<AC4EGG<@8
F84FBA4?
4A79H??G<@8CBF<G<BAFGB9H?9<??G;8F84FBA4?78@4A7B9BHE,CE<A:F84FBA
<A6E84F8778@4A77HE<A:G;8(/"C4A78@<6
4F6HFGB@8EF9B6HF87BA;B@8<@CEBI8@8AGCEB=86GF
4A74A4G<BAJ<78899BEGGB@B7<9LBHEFGBE8?4LBHG
What is going on here? Will I need to use OCR to parse a file like this? Or is there some way of translating that the stuff above back to text?