Questions tagged [text-parsing]

Text parsing is a variation of parsing which refers to the action of breaking a stream of text into different components, and capturing the relationship between those components.

When the stream of text is arbitrary, parsing is often used to mean breaking the stream into constituent atoms (words or lexemes).

When the stream of text corresponds to natural language, parsing is used to mean breaking the stream into natural language elements (words and punctuation) and discovering the structure of the text as phrases or sentences.

When the string of text corresponds to a computer source language (or other formal language), parsing consists of applying any of a variety of parsing algorithms (ad hoc, recursive descent, LL, LR, Packrat, Earley or other) to the source text (often broken into lexemes by another lower level parser called a "lexer") to verify the validity of the source language, and often to construct a parse tree representing the grammar productions used to tile the text.

1268 questions
5
votes
2 answers

fault-tolerant python based parser for WikiLeaks cables

Some time ago I started writing a BNF-based grammar for the cables which WikiLeaks released. However I now realized that my approach is maybe not the best and I'm looking for some improvement. A cabe consists of three parts. The head has some…
qbi
  • 2,104
  • 1
  • 23
  • 35
5
votes
1 answer

PyParsing Optional() hanging

When using only Optional or ZeroOrMore, pyparsing seems to enter in an infinite loop. The following code work but the part "# Should work with pp.Optional()" should indeed be Optional and not OneOrMore. Should I put some sort of stopOn in this…
Raphael
  • 959
  • 7
  • 21
5
votes
4 answers

How to parse a string and create several columns from it?

I have a varchar(max) field containing Name Value pairs, in every line I have Name UnderScore Value. I need to do a query against it so that it returns the Name, Value pairs in two columns (so by parsing the text, removing the underscore and the…
UnDiUdin
  • 14,924
  • 39
  • 151
  • 249
5
votes
5 answers

Is there a way to convert tables of text into a PowerShell Object

There are many tools that output their data in in a table format. One such example is diskpart. Shaving off some extraneous output, you would get something like this. Disk ### Status Size Free Dyn Gpt -------- ------------- …
Andy Schneider
  • 8,516
  • 6
  • 36
  • 52
5
votes
4 answers

C# Dependency Injection

I'm trying to see if I understand depedency injection. I have a project that is used as a parser. It can parse delimited text, key-value and will also regex. The first way this was done was in one function with a switch. The next way was to put it…
Aur
  • 191
  • 9
5
votes
4 answers

Flexible text parsing strategies

Problem I'm trying to find a flexible way to parse email content. Below is an example of dummy email text I'm working with. I'd also like to avoid regular expressions if at all possible. However, at this point of my problem solving process I'm…
Mike
  • 4,257
  • 3
  • 33
  • 47
5
votes
2 answers

How to parse and evaluate a math expression with Pandas Dataframe columns?

What I would like to do is to parse an expression such this one: result = A + B + sqrt(B + 4) Where A and B are columns of a dataframe. So I would have to parse the expresion like this in order to get the result: new_col = df.B + 4 result = df.A +…
ChesuCR
  • 9,352
  • 5
  • 51
  • 114
5
votes
1 answer

how to parse text using TextFSM with option (like or condiiton)

I need to parse out 'show env all' from switch/router, but there's have different text structure. Switch A : FAN is OK SYSTEM TEMPERATURE is OK System Temperature Value: 38 Degree Celsius System Temperature State: GREEN Yellow Threshold : 58 Degree…
Nedy Suprianto
  • 201
  • 1
  • 6
  • 14
5
votes
3 answers

How to parse text over multiple lines with textfsm?

I understood that TextFSM is a good way to parse text files, however, I see that it can parse data over single lines, my question is how to parse text spread over multiple lines. CUSIP No. 123456 13G …
Trinadh Gupta
  • 306
  • 5
  • 18
5
votes
1 answer

Extracing specific tags from arbitrary plain text

I want to parse plain text comments and look for certain tags within them. The types of tags I'm looking for look like: Where "name" is a [a-z] string (from a fixed list) and "1234" represents a [0-9]+ number. These tags can occur…
Nick B.
  • 79
  • 6
5
votes
4 answers

how to read text files and create a data frame in R

Need to read the txt file in https://raw.githubusercontent.com/fonnesbeck/Bios6301/master/datasets/addr.txt and convert them into a data frame R with column number as: LastName, FirstName, streetno, streetname, city, state, and zip... Tried to use…
Sheldon
  • 315
  • 2
  • 5
  • 13
5
votes
2 answers

Help with parsing a log file (ANTLR3)

I need a little guidance in writing a grammar to parse the log file of the game Aion. I've decided upon using Antlr3 (because it seems to be a tool that can do the job and I figured it's good for me to learn to use it). However, I've run into…
Unknown
  • 5,722
  • 5
  • 43
  • 64
5
votes
9 answers

Python tokenize sentence with optional key/val pairs

I'm trying to parse a sentence (or line of text) where you have a sentence and optionally followed some key/val pairs on the same line. Not only are the key/value pairs optional, they are dynamic. I'm looking for a result to be something…
tazzytazzy
  • 65
  • 6
5
votes
3 answers

Java: How read a File line by line by ignoring "\n"

I'm trying to read a tab separated text file line per line. The lines are separated by using carriage return ("\r\n") and LineFeed (\"n") is allowed within in tab separated text fields. Since I want to read the File Line per Line, I want my programm…
Del
  • 119
  • 1
  • 1
  • 8
5
votes
3 answers

Get non-numeric characters then number on each line of a block of texf

I have some strings which can be in the following format: sometext moretext 01 text text sometext moretext 002 text text 1 (somemoretext) etc I want to split these strings into following: text before the number and the number For example: text…
user1981823
  • 75
  • 1
  • 1
  • 4