Questions tagged [text-parsing]

Text parsing is a variation of parsing which refers to the action of breaking a stream of text into different components, and capturing the relationship between those components.

When the stream of text is arbitrary, parsing is often used to mean breaking the stream into constituent atoms (words or lexemes).

When the stream of text corresponds to natural language, parsing is used to mean breaking the stream into natural language elements (words and punctuation) and discovering the structure of the text as phrases or sentences.

When the string of text corresponds to a computer source language (or other formal language), parsing consists of applying any of a variety of parsing algorithms (ad hoc, recursive descent, LL, LR, Packrat, Earley or other) to the source text (often broken into lexemes by another lower level parser called a "lexer") to verify the validity of the source language, and often to construct a parse tree representing the grammar productions used to tile the text.

1268 questions

votes

4 answers

PDF Text Extraction Approach Using OCR

Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI…

java pdf text-parsing

asked Apr 22 '09 at 16:38

Jonathan Holloway

62,090
32
125
150

votes

5 answers

Retrieve definition for parenthesized abbreviation, based on letter count

I need to retrieve the definition of an acronym based on the number of letters enclosed in parentheses. For the data I'm dealing with, the number of letters in parentheses corresponds to the number of words to retrieve. I know this isn't a reliable…

python regex text text-parsing abbreviation

asked Jun 02 '19 at 02:45

tenebris silentio

votes

3 answers

Powershell: Read Text file line by line and split on "|"

I am having trouble splitting a line into an array using the "|" in a text file and reassembling it in a certain order. There are multiple lines like the original line in the text file. This is the original line:…

powershell csv text-parsing

asked Dec 13 '18 at 15:14

Dennis

votes

2 answers

How can I extract/parse tabular data from a text file in Perl?

I am looking for something like HTML::TableExtract, just not for HTML input, but for plain text input that contains "tables" formatted with indentation and spacing. Data could look like this: Here is some header text. Column One Column Two …

perl parsing text-parsing data-extraction

asked Oct 14 '10 at 03:10

Thilo

257,207
101
511
656

votes

4 answers

Parse string into a tree structure?

I'm trying to figure out how to parse a string in this format into a tree like data structure of arbitrary depth. "{{Hello big|Hi|Hey} {world|earth}|{Goodbye|farewell} {planet|rock|globe{.|!}}}" [[["Hello big" "Hi" "Hey"] ["world" "earth"]] …

parsing clojure tree text-processing text-parsing

asked Sep 29 '10 at 22:35

erikcw

10,787
15
58
75

votes

3 answers

Randomizing text between delimiters

I have this simple input I have {red;green;orange} fruit and cup of {tea;coffee;juice} I use Perl to identify patterns between two external brace delimiters { and }, and randomize the fields inside with the internal delimiter ;. I'm getting this…

perl shell text-processing text-parsing

asked Dec 24 '15 at 13:02

kempinski

votes

2 answers

List files on HTTP/FTP server in R

I'm trying to get list of files on HTTP/FTP server from R!, so that in next step I will be able to download them (or select some of files which meet my criteria to download). I know that it is possible to use external program in web browser…

regex r html-parsing text-parsing

asked Aug 25 '15 at 20:37

matandked

1,527
4
26
51

votes

2 answers

Parse values from a string

How would you parse the values in a string, such as the one below? 12:40:11 8 5 87 The gap between numbers varies, and the first value is a time. The following regular expression does not separate the time…

java regex text-parsing

asked Jun 22 '10 at 01:18

jgg

1,136
4
22
46

votes

2 answers

Regex pattern isn't matching certain show titles

Using C# regex to match and return data parsed from a string is returning unreliable results. The pattern I am using is as follows : Regex r=new Regex( @"(.*?)S?(\d{1,2})E?(\d{1,2})(.*)|(.*?)S?(\d{1,2})E?(\d{1,2})", …

c# regex text-parsing

asked May 23 '15 at 09:40

Kraang Prime

9,981
10
58
124

votes

7 answers

How to do a circular shift of strings in bash?

I have a homework assignment where I need to take input from a file and continuously remove the first word in a line and append it to the end of the line until all combinations have been done. I really don't know where to begin and would be thankful…

bash shell text-parsing

asked Apr 05 '10 at 12:23

Kyle Van Koevering

votes

1 answer

Can I control the way the CountVectorizer vectorizes the corpus in scikit learn?

I am working with a CountVectorizer from scikit learn, and I'm possibly attempting to do some things that the object was not made for...but I'm not sure. In terms of getting counts for occurrence: vocabulary = ['hi', 'bye', 'run away!'] corpus =…

python nlp scikit-learn text-parsing corpus

asked Jun 03 '14 at 05:36

tumultous_rooster

12,150
32
92
149

votes

0 answers

Parsing Expression Grammar for syntax highlighting

First... Would it be possible to accomplish simple syntax highlighting using a PEG. I'm only looking for it to be able to recognize and highlight basic things that are common to c style languages Second... If there are any examples of this or…

syntax-highlighting text-parsing peg

asked Oct 08 '13 at 18:42

Matt Zera

votes

5 answers

What do people mean when they say “Perl is very good at parsing”?

What do people mean when they say "Perl is very good at parsing"? How is Perl any better or more powerful than other scripting languages such as Python or Ruby?

perl parsing scripting text-parsing scripting-language

asked Dec 11 '09 at 14:53

Quintin Par

15,862
27
93
146

votes

3 answers

Parse string in javascript

How can I parse this string on a javascript, var string = "http://www.facebook.com/photo.php?fbid=322916384419110&set=a.265956512115091.68575.100001022542275&type=1"; I just want to get the "265956512115091" on the string. I somehow parse this…

javascript string parsing text-parsing string-parsing

asked Jan 28 '12 at 13:21

Robin Carlo Catacutan

13,249
11
52
85

votes

6 answers

String parsing, extracting numbers and letters

What's the easiest way to parse a string and extract a number and a letter? I have string that can be in the following format (number|letter or letter|number), i.e "10A", "B5", "C10", "1G", etc. I need to extract the 2 parts, i.e. "10A" -> "10" and…

c# regex string text-parsing

asked Apr 09 '09 at 16:23

Matt Warren

10,279
7
48
63

Prev 1 2 3

…

84 85 Next