0

I would like to read multiline sentence with abbreviation from txt file. I would like read sentence by sentence. Example sentence:

"Mr. Roger

and

Ms. Roger

are my teachers."

How can I get this?

I saw that in Scanner I can change delimiter but in my case it is not a good option because dot abbreviation will be treated as a end of sentence.

A.Sidor
  • 257
  • 4
  • 12
  • Possible duplicate of [Java library that finds sentence boundaries](https://stackoverflow.com/questions/483348/java-library-that-finds-sentence-boundaries) – tkruse Jul 01 '19 at 00:15

2 Answers2

0

You cannot solve this with a simple delimiert. You would need a dictionary of known abbreviations and then "skip" dots after a known abbreviation.

I think it would be simplest to first read the content of the whole file into a String or StringBuffer, then look for dots (.), look if there is one of the known abbreviations in front of the dot and if not copy the part from the last to the current delimiter. There is still the problem of recognizing a sentence that ends with a delimiter though...

Turing85
  • 18,217
  • 7
  • 33
  • 58
0

this will work , yet you may generalize to have something that look for Uppercase starting with very few lower case followers and a dot.

Use this one for now .+?(?:(?<![\s.]\p{Lu}|r|rof|s|rs|iss)[.!?]|$)