Questions tagged [parsing]

Parsing refers to breaking an artifact into its constituent elements and capturing the relationship between those elements. This tag isn't for questions about the self hosted Parse Platform (use the [parse-platform] tag) or parse errors in a particular programming language (use the appropriate language tag instead).

Parsing refers to the action by software of breaking an artifact into its constituent elements and capturing the relationship between those elements.

When the artifact is a stream of arbitrary text, parsing is often used to mean breaking the stream into constituent atoms (called words, tokens or lexemes).

When the artifact is a stream of natural language text, parsing is used to mean breaking the stream into natural language elements (words and punctuation) and discovering the structure of the text as phrases or sentences.

When the artifact is a stream of text corresponding to a computer language (or other formal language), parsing consists of applying any of a variety of parsing algorithms (ad hoc, recursive descent, LL, LR, Packrat, Earley or other) to the source text (often broken into lexemes by another lower level parser called a "lexer") to verify the validity of the source language, and often to construct a parse tree representing the grammar productions used to tile the text.

The term can be applied more generally to analyzing any complex structure such as a binary data file or a graph.

57220 questions
12
votes
5 answers

Find all nodes that have an attribute that matches a certain value with scala

I saw the following example on Nabble, where the goal was to return all nodes that contain an attribute with an id of X that contains a value Y: //find all nodes with an attribute "class" that contains the value "test" val xml = XML.loadString(…
ed.
  • 2,696
  • 3
  • 22
  • 25
12
votes
1 answer

SQL Parsing library for Python

We need a SQL parsing or decomposing library for Python. We would like to be able to input a SQL text query and then get the query parts back as a result. It doesn't need to be fancy, or anything, but we would like to avoid doing the parsing…
Juan Carlos Coto
  • 11,900
  • 22
  • 62
  • 102
12
votes
1 answer

Subset based on list of strings using grepl()?

I'm looking to do something seemingly very simple. I would like to subset a data frame in R using the grepl() command -- or something like it -- on several different phrases without constructing a loop. For example, I'd like to pull out all the…
baha-kev
  • 3,029
  • 9
  • 33
  • 31
12
votes
1 answer

How can I lazily parse big XHTML file in Clojure?

I have valid XHTML file (100 megabytes of data) with one large table. First tr are columns (for database), all other tr's are data. It is the only table in whole document and it is in structure html->body->div->table. How can I parse it lazy way in…
Jiri Knesl
  • 225
  • 1
  • 2
  • 7
12
votes
4 answers

PHP Simple HTML DOM Parser: Select only DIVs with multiple classes

I was searching like mad and found no solution. The problem is simple. Let's say I have 3 DIVs:
TEXT1
TEXT2
Chris
  • 123
  • 1
  • 1
  • 5
12
votes
4 answers

Yacc/Jay grammar file for JavaScript?

Possible Duplicate: Where can I find a yacc gammar for ECMAscript/Actionscript/Javascript I'm trying to find a grammar file for JavaScript for Yacc (preferably for Jay, but since Jay is a Yacc clone I should be fine, since I need to implement it…
thr
  • 19,160
  • 23
  • 93
  • 130
12
votes
4 answers

Prolog - DCG parser with input from file

As part of a project I need to write a parser that can read a file and parse into facts I can use in my program. The file structure looks as follows: property = { el1 , el2 , ... }. What I want in the end…
Floris Devriendt
  • 2,044
  • 4
  • 24
  • 34
12
votes
3 answers

How to read a csv file one line at a time and replace/edit certain lines as you go?

I have a 60GB csv file I need to make some modifications to. The customer wants some changes to the files data, but I don't want to regenerate the data in that file because it took 4 days to do. How can I read the file, line by line (not loading it…
richard
  • 12,263
  • 23
  • 95
  • 151
12
votes
5 answers

XML parsing in Python

I'd like to parse a simple, small XML file using python however work on pyXML seems to have ceased. I'd like to use python 2.6 if possible. Can anyone recommend an XML parser that will work with 2.6? Thanks
Alex
  • 143
  • 2
  • 3
  • 8
12
votes
1 answer

Parsing CDATA in xml with python

I need to parse an XML file with a number of blocks of CDATA that I need to retain for later plotting:
Jen
  • 265
  • 1
  • 4
  • 8
12
votes
1 answer

Using Gson and JsonObject to format and parse data

I am using JsonObject and Gson to format the data i need to send in the form of String and then retrieve and parse it somewhere else. This is my simple code which is not working: public static void main(String[] args) { Gson g = new Gson(); …
Jazib
  • 1,200
  • 1
  • 16
  • 39
12
votes
1 answer

scala: split string by commnas, ignoring commas between quotes

Possible Duplicate: Java: splitting a comma-separated string but ignoring commas in quotes It's easier to show some code I have the following: scala> val a = """op1,"op2.1,op2.2",,op4""".split(",") a: Array[java.lang.String] = Array(op1, "op2.1,…
opensas
  • 60,462
  • 79
  • 252
  • 386
12
votes
2 answers

How to handle unicode values in JSON strings?

I'm writing a JSON parser in C++ and am facing a problem when parsing JSON strings: The JSON specification states that JSON strings can contain unicode characters in the form of: "here comes a unicode character: \u05d9 !" My JSON parser tries to…
ereOn
  • 53,676
  • 39
  • 161
  • 238
12
votes
2 answers

JavaScript datetime parsing

Possible Duplicate: How can I convert string to datetime with format specification in JavaScript? I have a json response which contains a hashmap like; {"map":{"2012-10-10 03:47:00.0":23.400000000000002,"2012-10-10 03:52:00.0":23.3,"2012-10-10…
vtokmak
  • 1,496
  • 6
  • 35
  • 66
12
votes
3 answers

Parse email addresses for "from" and "to" fields in Ruby

In an email, it looks like a "from" or "to" field can contain one or more addresses, each address can be like "john@test.com" or "John D Jr " So a "from" field can look like any of the following: "a@a.com" "a@a.com, Bob Blue…
foobar
  • 10,854
  • 18
  • 58
  • 66
1 2 3
99
100