Extract fields from ODT document using Java library

Question

I need to use a Java library - or code - to extract field tags from the content of an ODT document. I know odt is some sort of zipped file and it has its contents ina a content.xml file. Of course I could just extract the files, open content.xml and parse it, but I believe some higher level code exists. Just as an example, the content looks like this:

<text:p text:style-name="Standard">Hi ${name}!</text:p>    
<text:p text:style-name="Standard">
<text:text-input text:description="JOOScript">$nome</text:text-input></text:p>

I would like to extract the fields as ${name} and $nome.

I know Apache Tika could be used for that, but I haven't spotted an example that actually shows field extraction. I believe this is because the fields I am using are unstructured text instead of input field tags.

Thanks in advance, Daniel

score 2 · Accepted Answer · edited Oct 20 '14 at 11:47

2

Well, just in case anyone is interested, we ended up using Apache Tika for obtaining the content from the odt and we have parsed it using the following regular expression:

\$\{[\w\-\.]*\}

edited Oct 20 '14 at 11:47

answered Apr 03 '12 at 02:00

dannyxyz22

978
9
18

Extract fields from ODT document using Java library

1 Answers1