Even better, you can use the following syntax that will accept several spaces and other blank characters after the sentence ending character and the leading blank characters will not be part of the string that will be extracted!!!
[^\s].+?[?!.](?=\s+|$)
Limitations:
- for example
10 B.C.
and other abbreviations will be detected as sentence...
- strings like:
terkena musibah!Uang saya 90.000 dicuri maling.
will be detected as one sentence...
New version:
I have adapted the regex in the following way, to solve the limitations of the regex proposed so far:
[^\s.!?][a-zA-Z@#$%^&,;"':*()-_+=/\\|{}><()[\]\s\d]*?([?!]|((?<=[^A-Z])\.(?=[^0-9])))
and I have test it on the following text:
Apa ya{ng terjadi? Ak[u +10 B.C. ter,ke]na 10.3 mus}ibah.Uang say\a 90!000 dic&uri ma|ling.
Apa yang te*r(j)adi? Aku terkena mus%ibah! Uang sa^ya 90.000 dicuri maling.
ter;ke|na mus-ibah?uang saya 90..000 dicuri m"aling.
ter@kena mus+ibah!ua=ng say$a 90?000 dicuri ma'ling.
terk\ena mus#ibah.uang saya 90.000 dicuri maling.
Apa yang terjadi? Aku 10 B. C. terke\na mu/sibah.Uang saya 90!000 dicuri maling.
Apa yang terjadi? Aku -10 B. C. terke\na mu/sibah. Uang saya 90!000 dicuri maling.
Advantages:
Abbreviations are preserved: Ak[u +10 B.C. ter,ke]na 10.3 mus}ibah.
is seen as one sentence, preserving the B.C.
terkena musibah!Uang saya 90.000 dicuri maling.
would be separated in two sentences: terkena musibah!
and Uang saya 90.000 dicuri maling.
Good luck!