21

I've created my own DefaultHandler to parse rss feeds and for most feeds it's working fine, however, for ESPN, it is cutting off part of the article url due to the way ESPN formats it's urls. An example of a full article url from ESPN..

http://sports.espn.go.com/nba/news/story?id=5189101&campaign=rss&source=ESPNHeadlines

The problem is for some reason the DefaultHandler characters method is only getting this from the tag that contains the above url.

http://sports.espn.go.com/nba/news/story?id=5189101

As you can see, it's cutting everything off the url from the ampersand escape code and after. How can I get the SAX parser to not cut my string off at this escape code? For ref. here is my characters method..

 public void characters(char ch[], int start, int length) {

  String chars = (new String(ch).substring(start, start + length));

  try {
   // If not in item, then title/link refers to feed
   if (!inItem) {
    if (inTitle)
     currentFeed.title = chars;
   } else {
    if (inLink)
     currentArticle.url = new URL(chars);
    if (inTitle)
     currentArticle.title = chars;
    if (inDescription)
     currentArticle.description = chars;
    if (inPubDate)
     currentArticle.pubDate = chars;
    if (inEnclosure) {
    }
   }
  } catch (MalformedURLException e) {
   Log.e("RSSReader", e.toString());
  }
 }

Rob W.

brockoli
  • 4,516
  • 7
  • 38
  • 45

3 Answers3

46

As you can see, it's cutting everything off the url from the ampersand escape code and after.

From the documentation of the characters() method:

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

When I write SAX parsers, I use a StringBuilder to append everything passed to characters():

public void characters (char ch[], int start, int length) {
    if (buf!=null) {
        for (int i=start; i<start+length; i++) {
            buf.append(ch[i]);
        }
    }
}

Then in endElement(), I take the contents of the StringBuilder and do something with it. That way, if the parser calls characters() several times, I don't miss anything.

CommonsWare
  • 986,068
  • 189
  • 2,389
  • 2,491
  • Ok, I didn't really take the time to fully understand how the parser was working. After reading your answer I went back and researched further to get a better understanding. Your suggestion was the problem of course, I've since updated my code to handle the char data properly. TY – brockoli May 17 '10 at 18:19
  • @CommonsWare: do it miss some characters? I am facing it in my case. – Ankit Jul 19 '13 at 11:28
  • I have image1:title in my xml and sometime I get full value and sometimes I got only "itle" or "Title". I have tried to print values but it has never printed "image1:" for partial values. – Ankit Jul 19 '13 at 11:34
  • @Ankit: Please open a fresh StackOverflow question, show your input, your parsing code, and your results. – CommonsWare Jul 19 '13 at 11:49
  • With you solution my problem got resolved even then I will post it as question for future readers. – Ankit Jul 19 '13 at 12:30
  • Thank you, your answers are always short, descriptive, provide actual reasoning behind the answer and of course on the spot! – Nemanja Mar 28 '14 at 12:41
  • @CommonsWare I am using SAX parser which contains the following text inside as tag as shown below Hi this book is selected for IIFA award. When I parse, and get the text from the tag book, I am getting the below content 'Hi this book is selected for IIFA award.' But I want this text 'Hi this book is selected for IIFA award.' Why the is missing in the text, how to get that while parsing ?? Please let me know – KK_07k11A0585 Jun 10 '15 at 14:11
  • 1
    @KK_07k11A0585: That is a separate XML element. You are already getting it while parsing, in your `startElement()` and `endElement()` methods. – CommonsWare Jun 10 '15 at 14:16
  • @CommonsWare Thanks, I have parsed that by adding that tag name in **startElement** and **endElement()**. But is there any other way to get the **complete text** inside the **tag** as **plain text** ?? In the above example, how can I get this text **'Hi this book is selected for IIFA'** as is from the tag **book** ?? – KK_07k11A0585 Jun 10 '15 at 14:43
  • @KK_07k11A0585: You would have to reassemble that yourself, using string concatenation. This has nothing to do with Android specifically. If you have further questions in this area, ask a fresh Stack Overflow question, tagged `java`, where you explain your input and what you are trying to achieve. – CommonsWare Jun 10 '15 at 14:49
6
@Override
public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {
    // TODO Auto-generated method stub
    sb=new StringBuilder();
    if(localName.equals("icon"))
    {
        iconflag=true;
    }
}

@Override
public void characters (char ch[], int start, int length) {
    if (sb!=null && iconflag == true) {
        for (int i=start; i<start+length; i++) {
            sb.append(ch[i]);
        }
    }
}

@Override
public void endElement(String uri, String localName, String qName)
        throws SAXException {
    // TODO Auto-generated method stub
    if(iconflag)
    {
        info.setIcon(sb.toString().trim());
        iconflag=false;
    }
}

So I figured it out, the code above is the solution.

anonymous123
  • 1,271
  • 6
  • 19
  • 43
0

I ran into this problem the other day, it turns out the reason for this is the CHaracters method is being called multiple times in case any of these Characters are contained in the Value:

"   &quot;
'   &apos;
<   &lt;
>   &gt;
&   &amp;

Also be careful about Linebreaks / newlines within the value!!! If the xml is linewrapped without your controll the characters method wil also be called for each line that is in the statement, plus it will return the linebreak! (which you manually need to strip out in turn).

A sample Handler taking care of all these problems is this one:

 DefaultHandler handler = new DefaultHandler() {
   private boolean isInANameTag = false;
   private String localname;
   private StringBuilder elementContent;

   @Override
   public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
    if (qname.equalsIgnoreCase("myfield")) {
      isInMyTag = true;
      this.localname = localname;
      this.elementContent = new StringBuilder();
    }
   }

   public void characters(char[] buffer, int start, int length) {
      if (isInMyTag) {
         String content = new String(ch, start, length);
         if (StringUtils.equals(content.substring(0, 1), "\n")) {
              // remove leading newline
              elementContent.append(content.substring(1));
         } else {
              elementContent.append(content);
         }
      }
   }

   public void endElement(String uri, String localName, String qName) throws SAXException {
     if (qname.equalsIgnoreCase("myfield")) {
       isInMyTag = false;
       // do something with elementContent.toString());
       System.out.println(elementContent.toString());
       this.localname = "";
     }
   }
}

I hope this helps.

fl0w
  • 3,593
  • 30
  • 34