Questions tagged [breakiterator]
13 questions
8
votes
1 answer
How does BreakIterator work in Android?
I'm making my own text processor in Android (a custom vertical script TextView for Mongolian). I thought I would have to find all the line breaking locations myself so that I could implement line wrapping, but then I discovered BreakIterator. This…

Suragch
- 484,302
- 314
- 1,365
- 1,393
4
votes
0 answers
BreakIterator doesn't find correct sentence boundary with parenthesized "i.e." or "e.g."
In the example below, BreakIterator appears to be failing on a fairly straightforward example.
Am I using BreakIterator incorrectly, or is this just a bug?
Example class:
import java.text.BreakIterator;
import java.util.Locale;
public class…

Archie
- 4,959
- 1
- 30
- 36
4
votes
1 answer
C# Equivalent for Java's BreakIterator
I'm working on a conversion project from java to c#, is there any c# equivalent for BreakIterator? I was trying IEnumerator, but cannot find iterator.SetText() usage below, can anyone suggest equivalent C# code for below lines:
String…

Pratik J
- 109
- 2
- 9
3
votes
1 answer
BreakIterator not working correctly with Chinese text
I used BreakIterator.getWordInstance to split a Chinese text into words. Here is my example
import java.text.BreakIterator;
import java.util.Locale;
public class Sample {
public static void main(String[] args) {
String stringToExamine =…

srgsanky
- 671
- 1
- 11
- 16
2
votes
2 answers
Maximum number of codepoints in a grapheme cluster
I am using the C++ ICU library. I wish to split a utf-8 string into approximately equal chunks. However, I want the chunks to be demarcated at grapheme cluster boundaries. I do not wish to convert my entire string into utf-16 to do this for both…

Nick Deguillaume
- 93
- 1
- 8
2
votes
1 answer
Separating a sentence word by word with JavaScript (client)
I'm trying to separate a sentence word by word but it seems like it is a very hard task with JavaScript. I can't simply separate the sentence by looking at the whitespace. Because there are languages (Thai, Chinese, Japanese, etc.) that don't use…

batatop
- 979
- 2
- 14
- 31
1
vote
0 answers
Java Break Iterator With Parentheses
Using a Java BreakIterator, I am able to extract words from a string. However, given the following string that uses parenthesis to indicate that a word could be plural, the parentheses are recognized as their own word.
String test = "Please enter…

Dynamic
- 497
- 1
- 10
- 17
1
vote
0 answers
Break strings into sentences in Java: BreakIterator fails on second occurrence of "Dr."
I would like to split a string into sentences. As this is not straightforward (due to many "." that are not end of sentences) I am using a BreakIterator as follows:
public static List textToSentences(String text) {
BreakIterator iterator…

lordy
- 610
- 15
- 30
0
votes
1 answer
How to substring graphems from String?
I try to substring 5 graphems from String and cant make it properly.
I have such String:
My last try was with BreakIterator:
public String truncate(String input) {
BreakIterator it = BreakIterator.getCharacterInstance();
…

Cherik
- 31
- 1
- 5
0
votes
2 answers
BreakIterator behaving differently in Android API 29 and API 30
I have made the below function to break String into Hindi Chars. But It behaves differently android API 29 and API 30. In Android 29 Hindi word चक्की is broken into च क् की But in Android 30 it is correctly broken into च क्की.
public List…

Prashanth
- 993
- 8
- 18
0
votes
1 answer
Resolving an Edge-Case while using Java's BreakIterator
I'm working on a side project to apply NLP to clinical data, and I'm using Java's BreakIterator to divide text into sentences for further analysis. When using BreakIterator, I'm coming across a problem where BreakIterator doesn't recognize sentences…

chethanjjj
- 53
- 5
0
votes
1 answer
Splitting Japanese text into words in java using BreakIterator
We are trying to break Japanese sentences into words using BreakIterator by following the code in this question. This code is working fine only for the text which is given in the question and when we try giving a different text e.g…

antnewbee
- 1,779
- 4
- 25
- 38
0
votes
1 answer
Android's BreakIterator considers line breaks as sentence delimiters
I have a unix text file that I want to read in my Android app and split it into sentences. However I noticed that BreakIterator considers some line break characters as sentence delimiters.
I use the following code to read the file and split it into…

ka3ak
- 2,435
- 2
- 30
- 57