I'm trying to write a program that reads a docx file and checks whether some of the text is colored. For instance, imagine if all the words bolded in this sentence were actually written in some arbitrary color. I want my program to recognize that the words "words bolded in this sentence were actually written in some arbitrary color" are colored.
Then after recognizing the coloration, I want to be able to edit the recognized text based on the color. For instance, if the the bolded text above were red, I want to add "Red>" tags around the text, while still keeping intact the rest of the sentence that isn't colored.
I was originally using ZipInputStream and ZipEntry to get the "word/document.xml," and I had planned on pulling the text and colors from there, but I feel like that would get too confusing after a while. I also tried using Apache poi, but I don't think it's able to recognize colors. Docx4j looks promising, though. Any thoughts, suggestions, or sample code to get me started?