I use Apache Tika to parse RTF files to get the plaintext as string. Now I want to remove some characters from this string -> ok. Now I want to save the result as RTF again. (You can think of this process as modifying an RTF file by deleting a paragraph.) How is this possible? How can I export this string to RTF with Tika?
Asked
Active
Viewed 2,228 times
4
-
Why not use a proper RTF library to edit directly, rather than trying to turn it into plain text then back again? – Gagravarr Apr 25 '12 at 14:12
-
i have to search for certain keywords and paragraphs in this rtf and remove them. Can you name a proper RTF library for Java? – Christian Schiepe Apr 25 '12 at 14:19
-
I tried with the standard Java Swing RTFEditorKit. It WOULD work fine if the RTFEditorKit would support multilanguage RTFs... unfortunately it doesn't (i need to go multilanguage!). Parsing Chinese and russian rtfs results in garbage, because the RTFEditorKit does not supply per-font encoding! I found this little comment in RTFReader: /* TODO: per-font font encodings ( \fcharset control word ) ? */ -> This is what i needed... – Christian Schiepe Apr 27 '12 at 12:21
1 Answers
2
There is a solution to edit docs, but it is a little complex. You can use the OpenOffice API to open a lot of types of docs and export it to other formats. I used it, sometime ago, to read data from a database and export as an odt and xls file.
I never used it to edit a doc, like a file from Writer or MS Word, but, by the OpenOffice documentation, I know that is possible to do it. Maybe this can be a cannon to kill a fly, but if you don find any other ways, could solve your problem.
The API works with Java, C++ etc.