Questions tagged [breakiterator]

13 questions
8
votes
1 answer

How does BreakIterator work in Android?

I'm making my own text processor in Android (a custom vertical script TextView for Mongolian). I thought I would have to find all the line breaking locations myself so that I could implement line wrapping, but then I discovered BreakIterator. This…
Suragch
  • 484,302
  • 314
  • 1,365
  • 1,393
4
votes
0 answers

BreakIterator doesn't find correct sentence boundary with parenthesized "i.e." or "e.g."

In the example below, BreakIterator appears to be failing on a fairly straightforward example. Am I using BreakIterator incorrectly, or is this just a bug? Example class: import java.text.BreakIterator; import java.util.Locale; public class…
Archie
  • 4,959
  • 1
  • 30
  • 36
4
votes
1 answer

C# Equivalent for Java's BreakIterator

I'm working on a conversion project from java to c#, is there any c# equivalent for BreakIterator? I was trying IEnumerator, but cannot find iterator.SetText() usage below, can anyone suggest equivalent C# code for below lines: String…
Pratik J
  • 109
  • 2
  • 9
3
votes
1 answer

BreakIterator not working correctly with Chinese text

I used BreakIterator.getWordInstance to split a Chinese text into words. Here is my example import java.text.BreakIterator; import java.util.Locale; public class Sample { public static void main(String[] args) { String stringToExamine =…
srgsanky
  • 671
  • 1
  • 11
  • 16
2
votes
2 answers

Maximum number of codepoints in a grapheme cluster

I am using the C++ ICU library. I wish to split a utf-8 string into approximately equal chunks. However, I want the chunks to be demarcated at grapheme cluster boundaries. I do not wish to convert my entire string into utf-16 to do this for both…
2
votes
1 answer

Separating a sentence word by word with JavaScript (client)

I'm trying to separate a sentence word by word but it seems like it is a very hard task with JavaScript. I can't simply separate the sentence by looking at the whitespace. Because there are languages (Thai, Chinese, Japanese, etc.) that don't use…
batatop
  • 979
  • 2
  • 14
  • 31
1
vote
0 answers

Java Break Iterator With Parentheses

Using a Java BreakIterator, I am able to extract words from a string. However, given the following string that uses parenthesis to indicate that a word could be plural, the parentheses are recognized as their own word. String test = "Please enter…
Dynamic
  • 497
  • 1
  • 10
  • 17
1
vote
0 answers

Break strings into sentences in Java: BreakIterator fails on second occurrence of "Dr."

I would like to split a string into sentences. As this is not straightforward (due to many "." that are not end of sentences) I am using a BreakIterator as follows: public static List textToSentences(String text) { BreakIterator iterator…
lordy
  • 610
  • 15
  • 30
0
votes
1 answer

How to substring graphems from String?

I try to substring 5 graphems from String and cant make it properly. I have such String: My last try was with BreakIterator: public String truncate(String input) { BreakIterator it = BreakIterator.getCharacterInstance(); …
Cherik
  • 31
  • 1
  • 5
0
votes
2 answers

BreakIterator behaving differently in Android API 29 and API 30

I have made the below function to break String into Hindi Chars. But It behaves differently android API 29 and API 30. In Android 29 Hindi word चक्की is broken into च क् की But in Android 30 it is correctly broken into च क्की. public List
Prashanth
  • 993
  • 8
  • 18
0
votes
1 answer

Resolving an Edge-Case while using Java's BreakIterator

I'm working on a side project to apply NLP to clinical data, and I'm using Java's BreakIterator to divide text into sentences for further analysis. When using BreakIterator, I'm coming across a problem where BreakIterator doesn't recognize sentences…
chethanjjj
  • 53
  • 5
0
votes
1 answer

Splitting Japanese text into words in java using BreakIterator

We are trying to break Japanese sentences into words using BreakIterator by following the code in this question. This code is working fine only for the text which is given in the question and when we try giving a different text e.g…
antnewbee
  • 1,779
  • 4
  • 25
  • 38
0
votes
1 answer

Android's BreakIterator considers line breaks as sentence delimiters

I have a unix text file that I want to read in my Android app and split it into sentences. However I noticed that BreakIterator considers some line break characters as sentence delimiters. I use the following code to read the file and split it into…
ka3ak
  • 2,435
  • 2
  • 30
  • 57