Questions tagged [text-segmentation]

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

References:

Related Tags:

197 questions
0
votes
2 answers

Matching sentences with regex in Java

I'm using the Scanner class in java to go through a a text file and extract each sentence. I'm using the setDelimiter method on my Scanner to the regex: Pattern.compile("[\\w]*[\\.|?|!][\\s]") This currently seems to work, but it leaves the…
Gary
  • 926
  • 3
  • 12
  • 24
0
votes
4 answers

javascript: select sentence in a paragraph

I want to create a text annotation tool. Suppose we have some texts displayed like in the picture below, the objective effect is: after the user click on somewhere in the text, the whole sentence is automatically selected and highlighted. I have no…
user3352464
  • 315
  • 1
  • 3
  • 14
0
votes
1 answer

how to run the example of uima-text-segmenter?

I want to call the API of uima-text-segmenter https://code.google.com/p/uima-text-segmenter/source/browse/trunk/INSTALL?r=22 to run an example. But I don`t know how to call the API... the readme said, With the DocumentAnalyzer, run the following…
little mike
  • 23
  • 1
  • 1
  • 5
0
votes
1 answer

Sentence segmentation with Regex in Python

I am writing a script to split the text into sentences with Python. However I am quite bad with writing more complex regular expressions. There are 5 rules according to which I wish to split the sentences. I want to split sentences if they: * end…
Helena
  • 921
  • 1
  • 15
  • 24
0
votes
3 answers

Python extracting sentence containing 2 words

I have the same problem that was discussed in this link Python extract sentence containing word, but the difference is that I want to find 2 words in the same sentence. I need to extract sentences from a corpus, which contains 2 specific words. Does…
Marcelo
  • 438
  • 5
  • 16
0
votes
1 answer

Capitalize the first letter of sentences in paragraphs

I am using WordPress and WP-O-Matic to automatically pull contents from different feeds. The contents are in all caps making the posts in the WordPress blog looks crappy. I tried using different techniques, but none of them seem to work…
Abhik
  • 664
  • 3
  • 22
  • 40
0
votes
2 answers

Word Segmentation using ICU

I am using ICU4C to transliterate CJK. I am wondering whether it is possible to have word segmentation in ICU, to split Chinese text into a sequence of words, defined according to some word segmentation standard. When I try transliterating for…
mrz
  • 1,802
  • 2
  • 21
  • 32
-1
votes
2 answers

How to convert plain text in segmented chunks (Bytes) in python?

Is there a simple way to convert plain text into a segmented array of chunks in python? Each chunk should be for example 16 Bytes. If the last part of the plain text is smaller than 16 Bytes it should can be filled in a smaller chunk.
Pm740
  • 339
  • 2
  • 12
-1
votes
1 answer

String Segmentation

Solved I have a string which has a conversation between two people along with their speaker tag. I want to split the string into two sub strings containing speaker 1 and speaker 2 conversation only. This is the code I am using to obtain the…
-1
votes
1 answer

How to fix this "segmentation fault 11" for this switch function?

I'm new to c and am writing a switch function that whenever the passed in string is ), }, ], it returns false when the popped out expression isn't the matching open parentheses. (yes, it's the balanced parentheses problem...) I can be sure that the…
-1
votes
1 answer

opencv - IndexError: index 26 is out of bounds for axis 0 with size 17

I am trying to segment text images, but I have a problem with one of the images of roi(region of interest) that its dimensions is (24, 3) and (44, 3) and it gives me IndexError: index 26 is out of bounds for axis 0 with size 17 for this particular…
Kaleab Woldemariam
  • 2,567
  • 4
  • 22
  • 43
-1
votes
1 answer

java sentence splitting error

I want to split sentences from a paragraph using java language. Consider the following sentence. we decided to go to u.s.a, canada,africa etc... from our office. I have only rs.1 lakh. So i called my dad and asked some money. he said "No.I…
-1
votes
1 answer

Split sentence into words (with special word list)

I have sentence: $text = "word word, dr. word: a.sh. word a.k word?!.."; special words are: "dr." , "a.sh" and "a.k" this : $text = "word word, dr. word: a.sh. word a.k word?!.."; $split = preg_split("/[^\w]([\s]+[^\w]|$)/", $text, -1,…
Guno
  • 31
  • 1
  • 1
  • 6
-1
votes
3 answers

dynamic programming word segmentation

Suppose I have a string like 'meetateight' and I need to segment it into meaningful words like 'meet' 'at' 'eight' using dynamic programming. To judge how “good” a block/segment "x = x1x2x3" is, I am given a black box that, on input x, returns a …
parth
  • 272
  • 1
  • 9
  • 20
-2
votes
1 answer

What are the libraries in R to tokenise any language text(e.g. : Chinese, Japanese, Arabic, etc)

I have to Tokenize a text to words. But I don't know the language of text. I could be any language. So I have to build a Tokenizer which will detect text language and tokenize it. If Tokenizer is not able to tokenize then I will return some flag…
jay_phate
  • 439
  • 3
  • 14
1 2 3
13
14