Questions tagged [text-segmentation]

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

References:

Related Tags:

197 questions
2
votes
3 answers

How to Split a Paragraph into Sentences

I've been trying to use: $string="The Dr. is here!!! I am glad I'm in the U.S.A. for the Dr. quality is great!!!!!!"; preg_match_all('~.*?[?.!]~s',$string,$sentences); print_r($sentences); But it doesn't work on Dr., U.S.A., etc. Does anyone have…
Scott Tyler
  • 67
  • 2
  • 6
2
votes
7 answers

counting the number of sentences in a paragraph in c

As part of my course, I have to learn C using Turbo C (unfortunately). Our teacher asked us to make a piece of code that counts the number of characters, words and sentences in a paragraph (only using printf, getch() and a while loop.. he doesn't…
dnclem
  • 2,818
  • 15
  • 46
  • 64
2
votes
3 answers

Split sentence into words

for example i have sentenes like this: $text = "word, word w.d. word!.."; I need array like this Array ( [0] => word [1] => word [2] => w.d [3] => word". ) I am very new for regular expression.. Here is what I tried: function…
Guno
  • 31
  • 1
  • 1
  • 6
2
votes
3 answers

Anyone know an example algorithm for word segmentation using dynamic programming?

If you search google for word segmentation there really are no very good descriptions of it and I'm just trying to fully understand the process a dynamic programming algorithm takes to find a segmentation of a string into individual words. Does…
Bill
  • 709
  • 1
  • 6
  • 4
2
votes
2 answers

regex split text document into sentences

I have a big text string and I am trying to split it into the sentences based on ". ? !". But my regex is not working somehow, can somebody guide me to detect the error? String str = "When my friend said he likes deep dish pizza one day, I…
voidMainReturn
  • 3,339
  • 6
  • 38
  • 66
2
votes
1 answer

Splitting HTML Content Into Sentences, But Keeping Subtags Intact

I'm using the code below to separate all text within a paragraph tag into sentences. It is working okay with a few exceptions. However, tags within paragraphs are chewed up and spit out. Example:

This is a sample of a link getting…

freedomflyer
  • 2,431
  • 3
  • 26
  • 38
2
votes
5 answers

how to extract a whole sentence by a single word match in a string?

So I have got a whole string (about 10k chars) and then searching for a word(or many words) in that string. With regex(word).Matches(scrappedstring). But how to do so to extract the whole sentence, that contains that word. I was thinking of taking a…
Milkncookiez
  • 6,817
  • 10
  • 57
  • 96
2
votes
0 answers

How search engines handle word segmentation and indexing

I'm thinking of implementing a small search engine. However I'm not sure how search engines do word segmentations. My thoughts are like this: Build a word dictionary containing popular words For each sentence in the html document, break the words…
NSF
  • 2,499
  • 6
  • 31
  • 55
1
vote
2 answers

How to uppercase the first letter in a sentence in PHP?

Possible Duplicate: How do I display the first letter as uppercase? PHP capitalize first letter of first word in a sentence I want to uppercase the first letter in a sentence and after a period. Can anyone suggest how to do? For example, //I…
shin
  • 31,901
  • 69
  • 184
  • 271
1
vote
1 answer

Remove all but the first word from a sentence

I need to find a way to take a sentence and remove all its words besides the first. If the sentence is "Hi my name is dingo" I need to get only the word "Hi"
Dan Naim
  • 161
  • 1
  • 3
  • 14
1
vote
0 answers

Solving Imbalance Classification on Video Transcript dataset

I am currently working on a problem that requires segmenting a video lecture transcript based on the topics present within the video. My dataset consists of sentence wise labels where 1 indicates the beginning of a new segment(ie. topic) and 0…
1
vote
0 answers

segmenting bs4.element.Tag

Is it possible to segment a bs4.element.Tag into several bs4.element.Tag? You can think of an application as the following: 1- The original bs4.element.Tag contains a paragraph. 2- We want to segment the paragraph in the original bs4.element.Tag…
A.M.
  • 1,757
  • 5
  • 22
  • 41
1
vote
1 answer

How to do character segmentation on an image (see description)?

I wanted to segment the characters from the background. So far I have been able to detect the image and generate bounding boxes around the image. (see image) Some people also consider generating the bounding boxes around the text to be segmentation…
1
vote
0 answers

Text segmentation in image receipt

I have been looking for a way to segment text from an image, specifically an image of a receipt. The problem I'm facing is that receipts have different layouts, except they always contain a table containing product names, product prices and the…
1
vote
2 answers

Split user input string into a list with every character

I'm trying to write a program for the micro:bit which displays text as morse code. I've looked at multiple websites and Stack Overflow posts for a way to split a string into characters. E.g. string = "hello" to chars = ["h","e","l","l","o"] I…
edapm
  • 45
  • 1
  • 9