Questions tagged [text-segmentation]

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics.

References:

Related Tags:

197 questions
1
vote
0 answers

How can I segment curved text lines?

I am looking a way to segment an image that has curved text lines. I need to segment the image in a way that I can access and manipulate each of the lines individually. I don't want to deskew the lines yet, but have already performed gaussian…
learner
  • 11
  • 2
1
vote
2 answers

How to accurately acquire line segments from the projection plot?

So this is basically something very simple, as in just get the horizontal projection plot and from that get the location of the lines on the image. But the problem is that the threshold that is applied is very variable. If I stay at a safe level,…
1
vote
2 answers

Text Segmentation Dataset

I wonder if someone can help me to get a dataset to test Text Segmentation approach that I developed and want to test. I looked for Freddy Choi's dataset and I couldn't find it. I need this dataset specifically. If someone has it or knows where I…
mbayomi
  • 71
  • 1
  • 8
1
vote
1 answer

What is the fastest way to search for patterns through 20-30 GB of multiple logfiles

I am performing log analysis, which I want to automate so that it runs daily and reports findings. The analysis runs on standard workstations, 8 cores, up to 32 GB of free RAM. The prototyping is based on GNU Grep (--mmap), Sqlite (on a RAM disk)…
wishi
  • 7,188
  • 17
  • 64
  • 103
1
vote
3 answers

Removing a sentence from a paragraph

I am attempting to write code to remove a whole sentence from a paragraph. It doesn't matter which sentence it is, but it needs to be at least one. String edit = "The cow goes moo. The cow goes boo. The cow goes roo. The cow goes jew."; int…
1
vote
1 answer

Sentence extraction from paragraph

Using strtok one can get each tocken in the para individually. I want to capture all sentences in the page individually for process them separately. One solution is I keep for loop and check each character, if it is . then I consider sentence is…
user123
  • 5,269
  • 16
  • 73
  • 121
1
vote
1 answer

JS/Jquery: string to words text-segmentation script using dictionary and longest match?

Given a string such : var str = "thisisinsane"; assisted by a list of words from a dictionary such: var dic = [ "insane", "i", "is", "sin", "in", "this", "totally" ]; How to split str into words? For this string, there are 3 words to identify.…
Hugolpz
  • 17,296
  • 26
  • 100
  • 187
1
vote
0 answers

Brute-force transposition decryption - word segmentation

I'm a 2nd year B. Comp. Sci. student and have a cryptography assignment that's really giving me grief. We've been given a text file of transposition-encrypted English phrases and an English dictionary file, then asked to write a program that…
1
vote
1 answer

Word-Counter in some hieroglyphics languages?

Is there any available library for word-counting of some hieroglyphics language (ex: chinese, japanese, korean...)? I found that MS Word count effectively texts in these languages. Can I add reference to MS Word libraries in my .NET application to…
Jin Ho
  • 3,565
  • 5
  • 23
  • 25
1
vote
1 answer

Splitting paragraphs into sentences

Given a paragraph, I want to split it into sentences. At the moment I'm simply doing this: var sentences = paragraph.split('.'); It works for the most part, however starts failing when it's given a sentence like this: Alaska is the largest state…
1
vote
2 answers

VBA for MS Word not looping through all sentences in a paragraph

I am trying to loop through all the sentences in a Word document and parse them into semi-HTML code. During testing, I ran into an interesting situation where any sentence followed by a non-closed sentence would be skipped. For example, if I have…
Michael
  • 2,158
  • 1
  • 21
  • 26
1
vote
3 answers

Find vowels in each word from a sentence entered by the user (java)

I have a program that gives the following output: Enter a Sentence: I am new to java I am new to java Number of vowels in: I am new to java = 6 My problem is that i need to get the vowels in each word of the sentence entered by the user. For e.g.…
user2348633
  • 13
  • 1
  • 4
1
vote
3 answers

convert paragraph into sentence using Perl

I'm doing Perl programming. I need to read a paragraph and print it out each sentence as a line. Anyone know how to do it? Below is my code: #! /C:/Perl64/bin/perl.exe use utf8; if (! open(INPUT, '< text1.txt')){ die "cannot open input file:…
new
  • 77
  • 2
  • 16
1
vote
2 answers

OCR word separation

I'm developing an OCR system, and need some help in word segmentation. Currently the OCR system detects blobs in a line (using connected components labeling algorithm). Each blob represents a separate letter, and has a bounding box around it. Some…
iRadium
  • 255
  • 2
  • 10
1
vote
3 answers

split paragraph by the first sentence

I have this div, and i want to add some style to the first sentence.
dfgdfg.asdhasd
I am trying this code but is not working as expected. var text =…
daniel__
  • 11,633
  • 15
  • 64
  • 91