51

Parsing is something I come across a lot in development, but as a junior it is one of those things I assume I will get the hang of at some point, when it is needed. In my current project I've been told to find and use an HTML parser for a certain function, I have found a couple on the web.

But what does an HTML parser actually do? And what does it mean to parse an object?

Grace
  • 2,548
  • 5
  • 26
  • 23
  • 2
    I think [this wikipedia article](http://en.wikipedia.org/wiki/Parsing) is a good starting point. – KB22 Nov 24 '09 at 09:04

8 Answers8

77

Parsing usually applies to text - the act of reading text and converting it into a more useful in-memory format, "understanding" what it means to some extent. So for example, an XML parser will take the sequence of characters (or bytes) and convert them into elements, attributes etc.

In some cases (particularly compilers) there's a separation between lexical analysis and syntactic analysis, so the real "understanding" part of the parser works on a sequence of tokens (identifiers, operators etc) rather than on the raw characters.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
17

Parsing is taking a set of data and extracting the meaningful information from it. With HTML parsing, you're looking to read some html and return a structured set of tags and text

Adam Hopkinson
  • 28,281
  • 7
  • 65
  • 99
10

You can start here: http://en.wikipedia.org/wiki/Parsing. Short excerpt:

Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).

Konamiman
  • 49,681
  • 17
  • 108
  • 138
5

Parse (computers), by Dictionary.com:

To analyze (a string of characters) in order to associate groups of characters with the syntactic units of the underlying grammar.

Igor
  • 26,650
  • 27
  • 89
  • 114
  • 1
    Is parsing and syntactial analysis the same? – Ini Mar 22 '18 at 20:46
  • Taken from the Dragon book: The second phase of the compiler is syntax analysis or parsing. The parser uses the first components of the tokens produced by the lexical analyzer to create a tree-like intermediate representation that depicts the grammatical structure of the token stream. A typical representation is a syntax tree in which each interior node represents an operation and the children of the node represent the arguments of the operation. A syntax tree for the token stream (1 .2) is shown – oskar132 May 19 '18 at 09:12
3

A parser is a compiler / interpreter component that breaks data into smaller elements for easy translation into another language. A parser takes input in the form of a sequence of tokens or program instructions and usually builds a data structure in the form of a parse tree or an abstract syntax tree.

Gajendra K Chauhan
  • 3,387
  • 7
  • 40
  • 55
2

In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar.

:0)

Wikipedia

Mongus Pong
  • 11,337
  • 9
  • 44
  • 72
1

It is the process of identifying the tokens [tags, attributes] inside an HTML.

rahul
  • 184,426
  • 49
  • 232
  • 263
1

Don't attempt to write anything but a trivial parser yourself. There are good tools for this use ANTLR and bison are two I can think of.

If you use the tools you'll be able to ask for help when you hit a problem.

cheers, Martin.

martsbradley
  • 148
  • 1
  • 9