Questions tagged [sax]

SAX stands for Simple API for XML, and is an event-based way of reading XML data from a document.

SAX (Simple API for XML) is an event-based sequential access parser API developed by the XML-DEV mailing list for XML documents.
SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially.

XML processing with SAX

A parser that implements SAX (i.e., a SAX Parser) functions as a stream parser, with an event-driven API. The user defines a number of callback methods that will be called when events occur during parsing. The SAX events include (among others):

Useful references:

1784 questions
10
votes
2 answers

Java XML Parsing and original byte offsets

I'd like to parse some well-formed XML into a DOM, but I'd like know the offset of each node's tag in the original media. For example, if I had an XML document with the content something like:
text
I'd like…
Bill Dwyer
  • 156
  • 1
  • 4
10
votes
2 answers

What is the advantage of using JAXP instead of DOM / SAX directly in Java?

Being new to XML parsing I'm trying to understand the different technologies. There is a confusing amount of different technologies for different…
hamena314
  • 2,969
  • 5
  • 30
  • 57
10
votes
4 answers

Parsing of badly formatted HTML in PHP

In my code I convert some styled xls document to html using openoffice. I then parse the tables using xml_parser_create. The problem is that openoffice creates oldschool html with unclosed
and
tags, it doesn't create doctypes and don't…
Thomas Ahle
  • 30,774
  • 21
  • 92
  • 114
10
votes
2 answers

SAX IncrementalParser in Jython

Python standard library provides xml.sax.xmlreader.IncrementalParser interface which has feed() method. Jython also provides xml.sax package that uses Java SAX parser implementation under the hood, but it seems not to provide IncrementalParser. Is…
minhee
  • 5,688
  • 5
  • 43
  • 81
10
votes
3 answers

porting to Android: why am I getting "Can't create default XMLReader; is system property org.xml.sax.driver set?"?

I am porting some Java code that worked fine on my desktop to Android. I have the following code segment: import org.xml.sax.InputSource; import org.xml.sax.XMLReader; import org.xml.sax.helpers.XMLReaderFactory; // ... XMLReader p =…
I Z
  • 5,719
  • 19
  • 53
  • 100
9
votes
3 answers

Use CSS selectors to collect HTML elements from a streaming parser (e.g. SAX stream)

How to parse CSS (CSS3) selector and use it (in jQuery-like way) to collect HTML elements not from DOM (from tree structure), but from stream (e.g. SAX), i.e. using sequential access event based parser? By the way, are there any CSS selectors (or…
Jakub Narębski
  • 309,089
  • 65
  • 217
  • 230
9
votes
2 answers

XML / Java: Precise line and character positions whilst parsing tags and attributes?

I’m trying to find a way to precisely determine the line number and character position of both tags and attributes whilst parsing an XML document. I want to do this so that I can report accurately to the author of the XML document (via a web…
Paul
  • 3,009
  • 16
  • 33
9
votes
3 answers

Why is Moose code so slow?

I'm trying to parse a large XML file. I read it using XML::SAX (using Expat, not the perl implementation) and put all the second level and below nodes into my "Node" class: package Node; use Moose; has "name" => ( isa => "Str", reader =>…
Paul Tomblin
  • 179,021
  • 58
  • 319
  • 408
9
votes
3 answers

Java. Sax parser. How to manually break parsing?

Tell me please is it possible to break the process of parsing? I.e. exit this loop not reaching the end of document and corresponding event "endDocument" ?
Andrey Khataev
  • 1,303
  • 6
  • 20
  • 46
9
votes
3 answers

SAXParseException: Content is not allowed in prolog

I need to add the following file to my Tomcat's '/conf' directory: After adding this file, I get…
Dónal
  • 185,044
  • 174
  • 569
  • 824
9
votes
3 answers

cvc-complex-type.2.4.a: Invalid content was found starting with element 'MarkupListURI'. One of '{MarkupDeleteURI}' is expected

I have been attempting to resolve this final issue with validating the return xml from the api to the xsd, in almost all instances that are similar the solution is to add the following line: elementFormDefault="qualified" however this line is…
MinosMythos
  • 93
  • 1
  • 1
  • 5
9
votes
3 answers

Effective way of creating a String from char[],start,length in Java

We are using Java SAX to parser on really big XML files. Our characters implementation looks like following: @Override public void characters(char ch[], int start, int length) throws SAXException { String value = String.copyValueOf(ch, start,…
Piotr Sobczyk
  • 6,443
  • 7
  • 47
  • 70
9
votes
1 answer

Parse a list of XML fragments with no root element from a stream input

Is it feasible in Java using the SAX api to parse a list of XML fragments with no root element from a stream input? I tried parsing such an XML but got a org.xml.sax.SAXParseException: The markup in the document following the root element must be…
yannisf
  • 6,016
  • 9
  • 39
  • 61
8
votes
3 answers

Can SAX Parsers use XPath in Java?

I'm trying to migrate one of my classes which uses DOM parsing with lots of XPath expressions to SAX parsing. DOM Parsing was good for me but some of the files i try to parse are too big and they cause server timeouts. I want to reuse the XPath with…
Nikola Dichev
  • 73
  • 1
  • 1
  • 6
8
votes
4 answers

Parsing html with SAX parser

I am trying to parse the normal html file using SAX parser. SAXBuilder builder2 = new SAXBuilder(); try { Document sdoc = (Document)builder2.build(readFile); NodeList nl=sdoc.getElementsByTagName("body"); …
user972590
  • 251
  • 1
  • 5
  • 13