Questions tagged [text-segmentation]

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

References:

Related Tags:

197 questions
1
vote
1 answer

Why solr not index some segmented words

I'm trying to index some Chinese documents with Solr, but it looks like Solr doesn't index some segmented words. Analyzer I use is IK analyzer http://code.google.com/p/ik-analyzer/. The field to be indexed:
emlaggr
  • 41
  • 4
1
vote
2 answers

Objective C Enumerate Sentences in a paragraph

I would like to write an enumerator that would go through a paragraph of text and gives me one sentence at a time. I tried using stringEnumerate with the NSStringEnumerationBySentences but that simply looks at the periods and fails. For example,…
Faz Ya
  • 1,480
  • 2
  • 15
  • 22
1
vote
1 answer

CodeIgniter's url segmentation not working with my JSON

It's my first post in here and I haven't yet figured out to format my post properly yet, but here it goes. So basically I can only get my code to work if i point directly to a php-file. If I try to call a method within my controller, nothing seems…
user1390322
  • 61
  • 1
  • 3
1
vote
3 answers

Finding first sentence in a paragraph

I have a string which basically contains a paragraph. There might be line breaks. Now I would want to get only the 1st sentence in the string. I thought I would try indexOf(". ") that is a dot with a space. The problem is that this won't work…
Geek
  • 3,187
  • 15
  • 70
  • 115
0
votes
1 answer

Resources for text boundary analysis

I need to do "text boundary analysis" in my project. I remember there is a resource from google might be a help for doing this job, but I don't quite remember the name or where to download. I remember this resource is a collective statistic data…
KenC
0
votes
3 answers

Convert a paragraph into sentences with dynamic memory

How can I convert a paragraph into sentences? I have a function signature as follows: char **makeSentences(char *paragraph); In which: paragraph is a string containing several sentences. Paragraph ensures that each sentence ends with a period (.)…
antiopengl
  • 119
  • 3
  • 11
0
votes
1 answer

Regex to differentiate between sentences and chapter text

I have a (running) text with many sentences. I have a regular expression that is able to extract the sentences that are terminated by a period, question or exclamation mark. The end of a sentence must be followed by a beginning of the next sentence…
andreSmol
  • 1,028
  • 2
  • 18
  • 30
0
votes
2 answers

How to count number of "words" in Chinese/Japanese content in Javascript

I'm trying to write a method to count the number of words when the content is in chinese and japanese. This should exclude the special characters / punctuations / whiteSpaces. I tried creating a regex for each locale and find the words based on it.…
Sherlock
  • 15
  • 4
0
votes
0 answers

Converting a string of words to single string without spaces?

I'm trying to convert a string of words to single string using Cpp. I want to take something like the following: string str:"Leetcode is cool" Then convert above to something like this : string str:"Leetcodeiscool"
jpsxlr8
  • 9
  • 2
0
votes
0 answers

How to fix sentence with missing spaces and misspelt words

I am working on an OCR project, and sometimes there are missing spaces and spelling mistakes in the recognized text. The good thing is that the possible recognized words are limited (~25 possible words). I can think of fuzzy search for misspelt…
0
votes
1 answer

How to split connected characters on image for further OCR?

OriginalImage1 BinarizedImage1 OriginalImage2 BinarizedImage2 OriginalImage3 BinarizedImage3 OriginalImage4 BinarizedImage4 I`m preparing image for OCR by Tesseract (pre-trained for this custom font) on Java (using OpenCV library). There is an image…
0
votes
1 answer

How to get the best merger from symspellpy word segmentation of many languages in Python?

The following code uses SymSpell in Python, see the symspellpy guide on word_segmentation. It uses "de-100k.txt" and "en-80k.txt" frequency dictionaries from a github repo, you need to save them in your working directory. As long as you do not want…
questionto42
  • 7,175
  • 4
  • 57
  • 90
0
votes
1 answer

How do i replace multiple consecutive parts of an array?

So the question revolve around character segmentation. My problem is the following: I want to segment characters, based on y-axis pixel numbers, following this ( in python) : source What i already done to get here: read image io.imread swap axis…
0
votes
1 answer

How to extract a whole word from a sentence by a specific fragment in C#?

How can I obtain a whole word within a string-type sentence? \ For instance, if the given string was: The app has been updated to 88.0.1234.141 which contains a number of fixes and improvements. And I want to get the word 88.0.1234.141 by a…
Bihui Jin
  • 13
  • 4
0
votes
1 answer

How do I split a paragraph between customer and customer service agent based on rules?

I have a paragraph that records the conversation between a customer and a customer service agent. How do I separate apart the conversation and create two lists (or any other format like a dictionary) with one that only contains the customer's text…
LY1
  • 35
  • 5