Highest Voted 'text-segmentation' Questions

2

votes

3 answers

Remove timestamp in the bracket from text Python

I'd like to remove all the timestamps in the parentheses in the below sample text data. Input: Agent: Can I help you? ( 3s ) Customer: Thank you( 40s ) Customer: I have a question about X. ( 8m 1s ) Agent: I can help here. Log in this website…

python regex timestamp python-re text-segmentation

asked Jan 21 '21 at 06:12

LY1

35
5

2

votes

0 answers

how to remove noise in the background of an old document image

how to remove the background of a image which contains many noises and lines etc [sample image][1] import cv2 from PIL import Image image = cv2.imread("1.jpg") #input image image = cv2.fastNlMeansDenoisingColored(image,None,10,10,7,21) gray =…

python ocr text-segmentation

asked Apr 18 '20 at 21:32

Milan KD

21
1

2

votes

2 answers

Perform line segmentation (cropping) serially with OpenCV

I am performing full Page Offline Handwriting Recognition with Deep Learning. The main idea is to build the model that can take one line of text image and give it's corresponding text. For this main task is do line segmentation of every line in a…

python opencv text-segmentation handwriting-recognition

asked Dec 28 '18 at 17:28

susan097

3,500
1
23
30

2

votes

2 answers

R: consider punctuation to do word segmentation

I use NGramTokenizer() to do 1~3 gram segmentation, but it seems doesn't consider punctuation, and removes punctuation. So the segmentation words isn't ideal for me. (like the result: oxidant amino, oxidant amino acid, pellet oxidant and so…

r tm text-segmentation

asked Sep 21 '17 at 05:05

Eva

483
1
4
13

2

votes

1 answer

Python - How to Extract sentences that contains Citation mark?

text = "Trondheim is a small city with a university and 140000 inhabitants. Its central bus systems has 42 bus lines, serving 590 stations, with 1900 (departures per) day in average. T h a t gives approximately 60000 scheduled bus station passings…

python regex text-segmentation citations

asked Aug 13 '17 at 14:10

gameon67

3,981
5
35
61

2

votes

1 answer

NLP: Within Sentence Segmentation / Boundary Detection

I am interested if there are libraries that break a sentence into small pieces based on content. E.g. input: sentence: "During our stay at the hotel we had a clean room, very nice bathroom, breathtaking view out the window and a delicious …

nlp nltk sentence text-segmentation

asked Jul 14 '17 at 22:33

Uther Pendragon

302
2
14

2

votes

1 answer

Using Tesseract OCR for Character Segmentation Only

I want to do text segmentation on a printed document. I already segment the document to the character segmentation but i failed when i meet some touching character. I want to use the Tesseract OCR only to segment the word. I know Tesseract can do…

python tesseract text-segmentation

asked Apr 13 '17 at 10:08

Christopher Wiraatmaja

23
3

2

votes

1 answer

How to count ocurrences of substings in string from text file - python

I want to count the number of lines on a .txt file were a string contains two sub-strings. I tried the following: with open(filename, 'r') as file: for line in file: wordsList = line.split() if any("leads" and "show" in s for s…

python string substring contains text-segmentation

asked Apr 08 '17 at 18:29

ignasibm

25
6

2

votes

0 answers

Chinese Segmentation : ICTCLAS Training Corpora

I am using the ICTCLAS segmentation tool for Chinese. We can read in "Automatic Recognition of Chinese Unknown Words Based on Roles Tagging" (Zhang, Liu, 2002) that it has been trained on the Peking University Corpus (PKU) : "The training corpus…

corpus text-segmentation

asked Feb 23 '17 at 08:31

Starckman

145
6

2

votes

2 answers

Getting the least amount of sub words

Solution by Dávid Horváth adapted to return the biggest smallest word: import java.util.*; public class SubWordsFinder { private Set words; public SubWordsFinder(Set words) { this.words = words; } …

java nlp text-segmentation

asked Apr 09 '16 at 18:38

BullyWiiPlaza

17,329
10
113
185

2

votes

3 answers

Parsing data from a file

I have been provided with a file containing data on recorded sightings of species, which is laid out in the format; "Species", "\t", "Latitude", "\t", "Longitude" I need to define a function that will load the data from the file into a list, whilst…

python list split text-segmentation

asked Mar 20 '16 at 20:43

NKing

23
4

2

votes

2 answers

Non reducable grapheme clusters in unicode

I'm of the opinion that "user perceived character" (henceforth UPC) iterator would be very useful in a unicode library. By UPC I mean the sense discussed in unicode standard annex 29, which is what a user perceives as a character, but might be…

unicode text-segmentation

asked Aug 13 '15 at 10:06

Spacemoose

3,856
1
27
48

2

votes

1 answer

How can I fix this memory issue in my maximum matching algorithm with RealmSwift?

I wrote my own maximum matching function in Swift to divide Chinese sentences into words. It works fine, except with abnormally long sentences the memory usage goes up over 1 gb. I need help figuring out how to modify my code so that there isn't…

algorithm swift realm text-segmentation

asked May 24 '15 at 11:15

webmagnets

2,266
3
33
60

2

votes

2 answers

Extract a Sentence Containing a Word Using Python... As well as the sentences around it?

There are a bunch of questions that get at extracting a particular sentence that contains a word (like extract a sentence using python and Python extract sentence containing word), and I have enough beginner experience with NLTK and SciPy to be able…

python regex nlp nltk text-segmentation

asked May 22 '14 at 06:08

alxlvt

675
2
10
18

2

votes

1 answer

segment paragraph to sentences

I'm trying to segment a paragraph to sentences. I selected '.', '?' and '!' as the segmentation symbols. I tried: format = r'((! )|(. )|(? ))' delimiter = re.compile(format) s = delimiter.split(line) but it gives me sre_constants.error: unexpected…

python regex python-2.7 text-segmentation

asked Apr 17 '14 at 14:48

ChuNan

1,131
2
11
27

Questions tagged [text-segmentation]