while learning Gate, I encountered the following problem:
Minipar throws exception when it sees uncommen characters like Ö, Ü, Ä.
For example in the sentence "Batten disease (also known as Spielmeyer-Vogt-Sjögren-Batten disease ) is a rare, fatal autosomal recessive neurodegenerative disorder that begins in childhood." (from a wiki article) The annotation Minipar got before it stopped working is "Batten disease (also known as Spielmeyer-Vogt-Sj" which is exactly before the character ö, so this makes me guessing that this is a case worth attention while using Gate. Because the same pipeline processed several other articles like a breeze.
In Messages Tab, it reprots:
gate.util.InvalidOffsetException
at gate.annotation.AnnotationSetImpl.getNodes(AnnotationSetImpl.java:773)
at gate.annotation.AnnotationSetImpl.add(AnnotationSetImpl.java:802)
at minipar.Minipar.runMinipar(Minipar.java:419)
at minipar.Minipar.execute(Minipar.java:527)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:154)
at gate.creole.SerialController.executeImpl(SerialController.java:153)
at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:129)
at gate.creole.AbstractController.execute(AbstractController.java:75)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.gui.SerialControllerEditor$RunAction$1.run(SerialControllerEditor.java:1619)
at java.lang.Thread.run(Unknown Source)
gate.creole.ExecutionException: gate.util.InvalidOffsetException
at minipar.Minipar.runMinipar(Minipar.java:491)
at minipar.Minipar.execute(Minipar.java:527)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:154)
at gate.creole.SerialController.executeImpl(SerialController.java:153)
at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:129)
at gate.creole.AbstractController.execute(AbstractController.java:75)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.gui.SerialControllerEditor$RunAction$1.run(SerialControllerEditor.java:1619)
at java.lang.Thread.run(Unknown Source)
Caused by: gate.util.InvalidOffsetException
at gate.annotation.AnnotationSetImpl.getNodes(AnnotationSetImpl.java:773)
at gate.annotation.AnnotationSetImpl.add(AnnotationSetImpl.java:802)
at minipar.Minipar.runMinipar(Minipar.java:419)
... 9 more
gate.creole.ExecutionException: Document doesn't have sentence annotations. please run tokenizer, sentence splitter and then Minipar
at minipar.Minipar.saveGateSentences(Minipar.java:194)
at minipar.Minipar.execute(Minipar.java:525)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:154)
at gate.creole.SerialController.executeImpl(SerialController.java:153)
at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:129)
at gate.creole.AbstractController.execute(AbstractController.java:75)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.gui.SerialControllerEditor$RunAction$1.run(SerialControllerEditor.java:1619)
at java.lang.Thread.run(Unknown Source)
I'd to thank Ian for his warm support once again.
Matt