2

I was assigned to work on this specific project, where we will be getting AFP(advanced function presentation) files and we need to get the documents, i.e.the content and the corresponding meta data. I have been looking into AFP(advanced function presentation) file format and haven't actually got any useful resource about how I should proceed with the task.

I have almost got no information up until now and don't know where to proceed. I looked into some open source projects and found this: https://github.com/yan74/afplib

I tried running it.. But it does not work on the sample AFP file which I have.

Really need some insight upon what resources should I go through to be able this project.

I need to write the code in Java and have gone through some licensed softwares which do the same,like PROARCHIVER and PAPYRUS.

Thanks in advance

1 Answers1

2

AFP is an easy format, it's composed of structured fields, your first step is decoding them, download this: "Mixed Object Document Content Architecture Reference" read first 50 pages and write code to split afp into structured fields, in order to create an easy dump of your file.

After that if you want to extract images AFP world calls them IOCA, so you need: Image Object Content Architecture reference

If you want to extract text (called PTX) you need: Presentation Text Object Content Architecture Reference

good job

owairc
  • 1,890
  • 1
  • 10
  • 9
  • Thanks for the answer. – Sarv Shakti Singh Jun 05 '17 at 11:27
  • One more thing, I tried to see how my AFP files are structured and found that most of the data within NOP structured fields. Can you tell me how do I parse this type of structured field, I am stuck here. Meanwhile, I will be reading the document that you mentioned in your answer. Thanks again. – Sarv Shakti Singh Jun 05 '17 at 12:00
  • NOP means NO oPeration, it's a comment, nop's payload is raw, it should be described in the first doc I listed – owairc Jun 07 '17 at 10:12