1

Looking for some ideas on how to remove the period character in sentences but not remove the periods in abbreviations. For instance

"The N.J. turnpike is long. Today is a beautiful day."

Would be changed to:

"The N.J. turnpike is long Today is a beautiful day"

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
chrislovecnm
  • 2,549
  • 3
  • 20
  • 36
  • 2
    Def. abbr. and how to tell apart from periods. – Tim Pietzcker May 04 '12 at 19:58
  • 1
    This is very tough. Consider "The N.J. Turnpike is long. Today is a beautiful day." (Note the capital "T" in "Turnpike".) I don't think this can be handled with a regex; some semantic analysis will be required. – Ted Hopp May 04 '12 at 19:59
  • 1
    And how would you handle `Today I am in N.Y. Tomorrow I will be in N.J.` – cjm May 04 '12 at 20:00
  • 2
    As Ted suggests, this is pretty much an [AI-complete](https://en.wikipedia.org/wiki/AI-complete) problem. – cjm May 04 '12 at 20:01
  • This is a good argument for using two spaces after the period at the end of a sentence. Semantically the same, displayed the same in HTML, but *much* easier to parse. – Barton Chittenden May 04 '12 at 20:11
  • 2
    Why would you want to remove periods at the end of a sentence in light of abbreviations? Go BIG: remove all dots, or go NONE! –  May 04 '12 at 20:35
  • Actually I think I like the idea to remove all. – chrislovecnm May 04 '12 at 21:13
  • sln if you want to add your comment as an answer I will accept it – chrislovecnm May 04 '12 at 21:21

3 Answers3

8

This is a hard problem. Lingua::EN::Sentence makes a three-quarters assed attempt to solve it. It knows about common abbreviations in American English and has hooks for you to add other abbreviations you know about.

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
mob
  • 117,087
  • 18
  • 149
  • 283
4

As others have said, this is a very difficult task in the general case. If you want to learn more, you should start out by reading more about "sentence segmentation" or "sentence boundary disambiguation," which is the task of dividing a text into sentences. Here's a few links to get you started:

Edward Loper
  • 15,374
  • 7
  • 43
  • 52
0

Why would you want to remove periods at the end of a sentence in light of abbreviations? Go BIG: remove all dots, or go NONE!