-3

I am working in sentence segmentation project and I am searching about SRX files (Segmentation Rules Exchange) for sentence splitting. I tried to find srx (Segmentation Rules Exchange) files for sentence splitting in English, French, German, Spanish, Italian. but I failed :(

Is there any body can help me because I don't want to spend my time to write this files ?

this is an example of this file :

<languagerule languagerulename="English">
<rule break="no">
<beforebreak>\b[nN]o\.\s</beforebreak>
<afterbreak>\p{N}</afterbreak>
</rule>
<rule break="no">
<beforebreak>\b(pp|[Vv]iz|i\.?\s*e|[Vvol]|[Rr]col|maj|Lt|[Ff]ig|[Ff]igs|[Vv]iz|[Vv]ols|[Aa]pprox|[Ii]ncl|Pres|[Dd]ept|min|max|[Gg]ovt|lb|ft|c\.?\s*f|vs)\.\s</beforebreak>
<afterbreak>[^\p{Lu}]|I</afterbreak>
</rule>
Wolfgang Fahl
  • 15,016
  • 11
  • 93
  • 186
dedo
  • 11
  • 2
  • 1
    "_Is there any body [that] can help me because I don't want to spend my time to write this files ?_" So you want someone else to write these files for you? Post any relevant attempts at solving your problem – skamazin Aug 20 '14 at 13:52

1 Answers1

2

LanguageTool has a file that covers those languages at https://github.com/languagetool-org/languagetool/blob/master/languagetool-core/src/main/resources/org/languagetool/resource/segment.srx (disclaimer: I'm the author of LanguageTool)

Daniel Naber
  • 1,594
  • 12
  • 19
  • thank you .. this file is useful for me. Do you know other files like this one? because I want merge all of them. – dedo Aug 25 '14 at 08:57