0

I am trying to extract data from RSS feed. RSS link - http://www.thehindu.com/sport/?service=rss?

Here are my default handler's character method.

public void characters(char[] ch, int start, int length) {
    String text = "";
    for (int i=0; i<length; i++)
        text += ch[start+i];

}

When I try to print the 'text' for the description tag, it comes out to be empty. Is there an error with the above code or is it the RSS data format that's causing the problem??

anuragneo
  • 61
  • 1
  • 2
  • 1
    `text` is a local variable. It’s lost when the method returns. By the way, that’s inefficient. You are creating a new temporary `String` instance for every character. Consider replacing the loop with something like `text=new StringBuilder(text).append(ch, start, length).toString()` which does the entire job. Even better would be keeping a `StringBuilder` for the entire parsing and create a `String` only when needed. – Holger Jul 31 '14 at 17:53

2 Answers2

2

The characters method might be invoked multiple times for a single text node better use something like this:

private StringBuilder stringBuilder; // or Deque<StringBuilder> for nested elements

public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {

  if ("...".equals(qName)) {
      stringBuilder = new StringBuilder();
  }

}

public void characters(char ch[], int start, int length)  {
  if (stringBuilder != null)
     stringBuilder.append(ch, start, length);
}

public void endElement(String uri, String localName, String qName) {
  if ("...".equals(qName)){
    String s = stringBuilder.toString();
  }
  stringBuilder = null;
}

The ... is used for the value of the element containing the text node. Depending on you namespace use, you might have to use localName as apposed to qName)

raphaëλ
  • 6,393
  • 2
  • 29
  • 35
  • For clarity, I would replace `if ("...".equals(qName))` with a call like `if(hasSignificantTextContent(uri, localName, qName) )` where the method is a place holder method that the questioner has to fill with life. In general, putting the condition into a method on its own is preferred over repeating it in the `startElement` and `endElement` method. – Holger Aug 01 '14 at 08:22
  • Let's see you want the text for an element `abc` then you would use `"foo".equals(qName)`. Note if you need nested text elements, you need a stack of StringBuilders as opposed to a single filed. – raphaëλ Aug 01 '14 at 10:35
  • It’s quite clear how the condition might look like. I only suggest encapsulating it into it’s own method. The requirement on the `StringBuilder` usage depends on what you are actually doing. E.g. if you build a (DOM-like) tree, `String` nodes are created and inserted when descending or encountering a sibling. Then you don’t need a stack, but you might have to create a `String` node even on `startElement`. – Holger Aug 01 '14 at 11:54
  • Strings are immutable. You need to build into a buffer (or any other "stream" like data structure). You need a stack to build XML structures like `abcabc`. Anyway just an idiom i have been using for a long (long) time. Only once had to use other streaming method as text nodes where over 10 GB :) – raphaëλ Aug 01 '14 at 15:17
  • It seems you didn’t understand my statement. If you create the `String` before descending, e.g. to add it to a (DOM-like) tree or to perform whatever committing operation, you don’t need a stack of buffers as the buffer is ready for re-use after the creation of the `String`. Usually you use a SAX parser instead of a DOM tree because you do not need all data in memory at a time… – Holger Aug 01 '14 at 15:51
  • With SAX it is up to you what you keep in memory. If you need all for processing, then all remains in memory. If you need to write a file, the your streams will not be memory etc. If you need the whole string, then build in (in memory), if you can't keep it in memory (which i did not see in your original post), then write to another stream. BTW DOM parsers have filters, you don't have to put all in memory, a common misconception about SAX vs DOM. – raphaëλ Aug 01 '14 at 19:33
0

It isn't clear how we are getting to here from the SAX representation of the RSS; Or, for that matter, what you have done to validate that you got to the URL, fetched and parsed some RSS.

But this method seems to do what the Java API can do in a String constructor: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#String%28char[],%20int,%20int%29

  • Are you aware of the `start` and `length` parameter and the fact that the handler method might be called multiple times for parts of a single text node? – Holger Jul 31 '14 at 17:58
  • The link I gave was for the char[] constructor, but the same page has the constructor that takes a position and a length. As for HOW this string, once created, is to be used, that's really the question here, isn't it? –  Jul 31 '14 at 18:31