Questions tagged [text-segmentation]

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

References:

Related Tags:

197 questions
0
votes
1 answer

TypeError: '<' not supported between instances of 'dict' and 'int'

I just began studying Python and I am currently trying to count the frequency of character sequences in a segmentation (segment into words). I have a problem with my count_seq function. In this function I want in the first loop to go fetch each…
user11317276
0
votes
0 answers

Sentence segmentation rule not working as expected

I have created my own simple sentence segmentation rule to sentencize on a new line (and keep the default segmentation rules as well): import spacy nlp = spacy.load('en_core_web_sm') def set_custom_boundaries(doc): for token in doc[:-1]: …
0
votes
1 answer

text segmentation based on punctuation marks, especially at clause level

I want to segment the text when we encounter the punctuation mark in a sentence or paragraph. If I use comma(,) in my regex it is also chunking the individual nouns verbs or adjectives separated by comma. Suppose we have "dogs, cats, rats and other…
0
votes
0 answers

Training a computer for word auto-segmentation (non-english language)

I have been given a set of 80 non-english words in an excel file..the first column contains the resulting word after a crude automatic segmentation has been applied to it and the second column contains the resulting word after being segmented…
Georgy90
  • 155
  • 2
  • 12
0
votes
0 answers

Hashtag segmentation in Python

I am trying to split a term which contains a hashtag of multiple words such as: #goodmorning #everythingIsGood The problem that I am facing is raised in the cases in which the individual words are not capitalized. I am using a list of common words…
patri
  • 333
  • 1
  • 4
  • 16
0
votes
1 answer

How to implement parallel process on huge dataframe

Now, I have one huge dataframe "all_in_one", all_in_one.info() RangeIndex: 8271066 entries, 0 to 8271065 Data columns (total 3 columns): label int64 text object type int64 dtypes: int64(2),…
Ivan Lee
  • 3,420
  • 4
  • 30
  • 45
0
votes
0 answers

Python extract sentences containing list of spesific words

I am trying to extract sentences from large text set that contain list of words. For example searching for "noodl", "vege" and "meat". str1 = "My new noodles are great\n vegetables. Not \nthis noodle sentence though.\n Nor this vege…
vahvero
  • 525
  • 11
  • 24
0
votes
1 answer

Use Python to extract three sentences based on word finding

I'm working on a text-mining use case in python. These are the sentences of interest: As a result may continue to be adversely impacted, by fluctuations in foreign currency exchange rates. Certain events such as the threat of additional tariffs on…
PratikSharma
  • 321
  • 2
  • 17
0
votes
1 answer

Segmentation of images with characters that are joined

I'm trying to draw a boundary box around the different characters in the image below. However, as some of the characters are joined, I can't have the box drawn. I have tried a few things such as; dilating, eroding and trying different blurs but I…
poofit
  • 13
  • 4
0
votes
2 answers

Text Segmentation using Python package of wordsegment

Folks, I am using python library of wordsegment by Grant Jenks for the past couple of hours. The library works fine for any incomplete words or separating combined words such as e nd ==> end and thisisacat ==> this is a cat. I am working on the…
Saurabh Gokhale
  • 53,625
  • 36
  • 139
  • 164
0
votes
1 answer

segment each character from noisy number plate

I am doing a project on Nepali Number Plate Detection where I have detected my number plate from the vehicle ani skewed the number plate but the result is a noisy image of number plate. I want to know how to segment every character out of it so it…
0
votes
0 answers

Detect Speech Activity (2 Speakers) in an audio recording

Goal: I want to extract the segments (timecodes) of speech activity in an audio recording containing two speakers. Optimally the solution should assign a label to the segments "Speaker 1" or "Speaker 2". Problem: I found VAD (Voice Activity…
Björn
  • 1,610
  • 2
  • 17
  • 37
0
votes
1 answer

Text Segmentation Based On Direct Sentence

Suppose I have a docx file like this : When I was a young boy my father took me into the city to see a marching band. He said, "Son when you grow up would you be the savior of the broken?". My father sat beside me, hugging my…
0
votes
0 answers

Handwritten line segmentation (cursive text)

I am working on line segmentation of cursive text Arabic, Urdu. Text lines are detected properly, by computing density of dark pixels in a row. Consecutive rows having more than threshold pixels are cropped, by using the following code: %line…
saud00
  • 465
  • 1
  • 4
  • 13
0
votes
1 answer

IOB tagging scheme, usage of BI scheme

I often see variants of IOB tagging scheme such as IOB, BIO, IOBES mentioned in the literature for chunking, NER etc. I tried using only BI tags for detecting morpheme boundaries (segmentation) in a binary classification setting and got high F1…
math
  • 49
  • 1
  • 7