I am using DataLogic utilities(Datalogics.PDFL) to manipulate the PDF, I am facing issues with the below scenario. A PDF with non-english text getting weird output.
Sample input file SS
Getting output in the below format for the same:
WordFinderConfig wordConfig = new WordFinderConfig();
wordConfig.IgnoreCharGaps = false;
wordConfig.IgnoreLineGaps = false;
wordConfig.NoAnnots = false;
wordConfig.NoEncodingGuess = false;
// Std Roman treatment for custom encoding; overrides the noEncodingGuess option
wordConfig.UnknownToStdEnc = true;
wordConfig.DisableTaggedPDF = false; // legacy mode WordFinder creation
wordConfig.NoXYSort = true;
wordConfig.PreserveSpaces = false;
wordConfig.NoLigatureExp = false;
wordConfig.NoHyphenDetection = false;
wordConfig.TrustNBSpace = false;
wordConfig.NoExtCharOffset = false; // text extraction efficiency
wordConfig.NoStyleInfo = false; // text extraction efficiency
WordFinder wordFinder = new WordFinder(doc, WordFinderVersion.Latest, wordConfig);