I am trying to extract text out of a PDF document. I am wondering how does PDF handle bulleted paragraphs. Consider this example:
Does PDF retain any logical meta-information that the 2 chunks of text shown above are members of a bulleted list system OR is it just left to the human mind to interpret the bullet symbols? This information would be very helpful to me in developing a text mining tool that I am currently engaged with.
Thanks, S