I am using document Oracle Outside In to output text content of pdf document.
I am using below parameters to pass to main function of CASample.c
file from content access of https://www.oracle.com/middleware/technologies/outside-in-technology-downloads.html#
C:\adobe-acrobat.pdf -u C:\adobe-acrobat.txt";
Which gives me text in below format.
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 8, Character Set = 0x00030100.
Outside
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 3, Character Set = 0x00030100.
In
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 8, Character Set = 0x00030100.
Unlocks
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 9, Character Set = 0x00030100.
Business
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 10, Character Set = 0x00030100.
Documents
SCCCA_TEXT: dwSubType = 0x08020001, Number of Characters = 4, Character Set = 0x00030100.
for
SCCCA_TEXT: dwSubType = 0x08020002, Number of Characters = 1, Character Set = 0x00030100.
So how do I only get text out of it without metadata? like instead of above entire metadata content I only need Outside In Unlocks Business Documents for
or do I have to make my own parser to get those data?