I need to find the dates within a string and their positions. Consider the example string
"The interesting date is 4 days from today and it is 20th july of this year, another date is 18th Feb 1997"
I need the output (Assuming today is 2013-07-14)
2013-07-17, position 25
2013-07-20, position 56
1997-02-18, position 93
I have managed to write the code to get the various parts of the string that is recognized as date. Need to enhance/change this to achieve the above output. Any hints or help is appreciated:
Properties props = new Properties();
AnnotationPipeline pipeline = new AnnotationPipeline();
pipeline.addAnnotator(new PTBTokenizerAnnotator(false));
pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
pipeline.addAnnotator(new POSTaggerAnnotator(false));
pipeline.addAnnotator(new TimeAnnotator("sutime", props));
Annotation annotation = new Annotation("The interesting date is 4 days from today and it is 20th july of this year, another date is 18th Feb 1997");
annotation.set(CoreAnnotations.DocDateAnnotation.class, "2013-07-14");
pipeline.annotate(annotation);
List<CoreMap> timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);
timexAnnsAll.each(){
println it
}
With the above code I get the output as:
4 days from today
20th july of this year
18th Feb 1997
EDIT::
Managed to get the date part, with the following change
timexAnnsAll.each(){it ->
Timex timex = it.get(TimeAnnotations.TimexAnnotation.class);
println timex.val + " from : $it"
}
Now the output is:
2013-07-18 from : 4 days from today
2013-07-20 from : 20th july of this year
1997-02-18 from : 18th Feb 1997
All I need to solve now is to find the position of the date within the original string.