I've been struggling with this for a few days, and I was wondering maybe someone can help me with it.
What I am trying to accomplish is to process a text file which has a set of questions and answers. The contents of the file (.doc or .docx) look like this:
Document Name
1. Question one:
a. Answer one to question one
b. Answer two to question one
c. Answer three to question one
2. Question two:
a. Answer one to question two
c. Answer two to question two
e. Answer three to question two
What I have tried so far is:
Reading the contents of the document via Apache POI like this:
fis = new FileInputStream(new File(FilePath));
XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor extract = new XWPFWordExtractor(doc);
String extractorText = extract.getText();
So, till now, I have the contents of the document. Next, I've tried to create a regex pattern that will match the numbers and the dot at the start of the question (1., 12.) and to continue until it matches the colon by this:
Pattern regexPattern = Pattern.compile("^(\\d|\\d\\d)+\\.[^:]+:\\s*$", Pattern.MULTILINE);
Matcher regexMatcher = regexPattern.matcher(extractorText);
However, when I try to loop thru the result set, I cannot find any questions text:
while (regexMatcher.find()) {
System.out.println("Found");
for (int i = 0; i < regexMatcher.groupCount() - 2; i += 2) {
map.put(regexMatcher.group(i + 1), regexMatcher.group(i + 2));
System.out.println("#" + regexMatcher.group(i + 1) + " >> " + regexMatcher.group(i + 2));
}
}
I am not sure where I am going wrong since I am a newbie in Java, and was hoping someone can help me out.
Also, if anyone has a better approach on how to create a map with the questions and the answers related to them, it will be very much appreciated.
Thank you in advance.
Edit: I am trying to obtain something like a Map which will contain the key (the question text) and another list of strings which will represent the set of answers related to that question, something like:
Map<String, List<String>> desiredResult = new HashMap<>();
desiredResult.entrySet().forEach((entry) -> {
String questionText = entry.getKey();
List<String> answersList = entry.getValue();
System.out.println("Now at question: " + questionText);
answersList.forEach((answerText) -> {
System.out.println("Now at answer: " + answerText);
});
});
Which would generate the following output:
Now at question: 1. Question one:
Now at answer: a. Answer one to question one
Now at answer: b. Answer two to question one
Now at answer: c. Answer three to question one