Questions tagged [sax]

SAX stands for Simple API for XML, and is an event-based way of reading XML data from a document.

SAX (Simple API for XML) is an event-based sequential access parser API developed by the XML-DEV mailing list for XML documents.
SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially.

XML processing with SAX

A parser that implements SAX (i.e., a SAX Parser) functions as a stream parser, with an event-driven API. The user defines a number of callback methods that will be called when events occur during parsing. The SAX events include (among others):

Useful references:

1784 questions
21
votes
3 answers

Android SAX parser not getting full text from between tags

I've created my own DefaultHandler to parse rss feeds and for most feeds it's working fine, however, for ESPN, it is cutting off part of the article url due to the way ESPN formats it's urls. An example of a full article url from…
brockoli
  • 4,516
  • 7
  • 38
  • 45
20
votes
6 answers

Parse large RDF in Python

I'd like to parse a very large (about 200MB) RDF file in python. Should I be using sax or some other library? I'd appreciate some very basic code that I can build on, say to retrieve a tag. Thanks in advance.
usertest
  • 27,132
  • 30
  • 72
  • 94
20
votes
3 answers

How to tell Java SAX Parser to ignore invalid character references?

When trying to parse incorrect XML with a character reference such as , Java's SAX Parser dies a horrible death with a fatal error such as org.xml.sax.SAXParseException: Character reference "" is an…
Epaga
  • 38,231
  • 58
  • 157
  • 245
20
votes
3 answers

Parsing broken XML with lxml.etree.iterparse

I'm trying to parse a huge xml file with lxml in a memory efficient manner (ie streaming lazily from disk instead of loading the whole file in memory). Unfortunately, the file contains some bad ascii characters that break the default parser. The…
erikcw
  • 10,787
  • 15
  • 58
  • 75
20
votes
3 answers

Efficient XSLT pipeline in Java (or redirecting Results to Sources)

I have a series of XSL 2.0 stylesheets that feed into each other, i.e. the output of stylesheet A feeds B feeds C. What is the most efficient way of doing this? The question rephrased is: how can one efficiently route the output of one…
Chris Scott
  • 1,721
  • 14
  • 27
19
votes
6 answers

Generating XML using SAX and Java

Anyone know of a good tutorial (or have a good example) for writing XML using the SAX framework (or something similar) and Java? Searching has yielded very little in terms of useful results. I'm trying to export from an Android app and am looking to…
Lunchbox
  • 2,136
  • 7
  • 29
  • 40
19
votes
4 answers

How to get error's line number while validating a XML file against a XML schema

I'm trying to validade a XML against a W3C XML Schema. The following code does the job and reports when error occurs. But I'm unable to get line number of the error. It always returns -1. Is there a easy way to get the line number? import…
pablosaraiva
  • 2,343
  • 1
  • 27
  • 38
19
votes
6 answers

Parsing very large XML documents (and a bit more) in java

(All of the following is to be written in Java) I have to build an application that will take as input XML documents that are, potentially, very large. The document is encrypted -- not with XMLsec, but with my client's preexisting encryption…
Chris R
  • 17,546
  • 23
  • 105
  • 172
17
votes
6 answers

Parsing an XML stream with no root element

I need to parse a continuous stream of well-formed XML elements, to which I am only given an already constructed java.io.Reader object. These elements are not enclosed in a root element, nor are they prepended with an XML header like
PNS
  • 19,295
  • 32
  • 96
  • 143
15
votes
3 answers

Python sax to lxml for 80+GB XML

How would you read an XML file using sax and convert it to a lxml etree.iterparse element? To provide an overview of the problem, I have built an XML ingestion tool using lxml for an XML feed that will range in the size of 25 - 500MB that needs…
Nick
  • 763
  • 1
  • 11
  • 26
15
votes
5 answers

Java SAX Parsing

There's an XML stream which I need to parse. Since I only need to do it once and build my java objects, SAX looks like the natural choice. I'm extending DefaultHandler and implementing the startElement, endElement and characters methods, having…
Haji
  • 1,715
  • 7
  • 25
  • 41
14
votes
2 answers

How to use SAXParseException effectively in Java

I'm validating against XMLSchema in Java, and getting SAXParseExceptions thrown when I have non-valid content models. I'm going to be using these exceptions to highlight where the validation has failed - but the SAXParseExceptions seem to be a…
brabster
  • 42,504
  • 27
  • 146
  • 186
14
votes
2 answers

Handling change in newlines by XML transformation for CDATA from Java 8 to Java 11

With Java 9 there was a change in the way javax.xml.transform.Transformer with OutputKeys.INDENT handles CDATA tags. In short, in Java 8 a tag named 'test' containing some character data would result in: But with Java…
Rick
  • 935
  • 2
  • 7
  • 22
14
votes
2 answers

Java: Saving StreamResult to a file

I am doing some data conversion(like csv) to xml with SAX then using transformer in Java. The result is in StreamResult, and I am trying to save this result to a file.xml but I can't find way to save StreamResult into file. am I doing this all…
Todd
  • 235
  • 2
  • 3
  • 6
13
votes
4 answers

Java SAXParser: Different between `localName` and `qName`

In Java, Handler class contains method which name is startElement.this method has prototype: public void startElement(String uri, String localName, String qName, Attributes attributes) I have read on Oracle Java website, but I still not understand…
hqt
  • 29,632
  • 51
  • 171
  • 250