I'm using this https://github.com/TomRoush/PdfBox-Android PDFBox on Android Studio library to extract text from a PDF document. Here's what I'm doing:
File pdf_file = new File(file_path);
to create the file, then
PDDocument document = null;
document = PDDocument.load(pdf_file);
to load the file into a PDDocument object, and then
PDFTextStripper pdfStripper = new PDFTextStripper();
pdfStripper.setStartPage(...);
pdfStripper.setEndPage(...);
String page_text = pdfStripper.getText(document);
to get the text content of the page. The issue is that when there's for example the word "firm" it displays it like "fi rm". It basically puts a space after fi (and I guess fls and other ligatures). I tried reading this Problems with extracting OpenTypeFont text using pdfBox but I don't understand how to fix it. There are no solution details.
Important: As it turns out, in my PDF file, I don't have any ligatures such as fi but I have regular fi and yet, there's space after it. A solution is unclear.
PDF file: https://wetransfer.com/downloads/09e9036dda4a7962ccad32b1cbcd8edc20200506050349/ab4752