0

I want to parse following with htmlparser.I wrote code for title and its working fine.i tried for following tag but nothing is working.please help i am doing this kind of programmming for the first time. 1) I want to retrieve img src url from img tag

<div id="images">
<img src="../images/abc.jpg" align="right" style="padding-right:5px;"> 

2) I want to retrieve text content between <li> tags.

<ul>
    <li>hello</li>
    <li>how r u?</li>
    <li>bye</li>
   </ul>

I tried following code to retrieve img tag src url.But it throws nullpointer exception.

 Parser parser=new Parser();
 HasAttributeFilter imgfil=new HasAttributeFilter("align","right");
 NodeList img=parser.parse(imgfil);
 Node node1=img.elementAt(0);
 ImageTag tg=(ImageTag) node1;
 String url=tg.getText();
 System.out.println(url);

I tried following snippet too.But nothing works.

 NodeList img=parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("img"),new HasAttributeFilter("align","right")));
          SimpleNodeIterator iterate=img.elements();
          while (iterate.hasMoreNodes()) 
          {
          Node node1 = iterate.nextNode();
          ImageTag tag = (ImageTag)node1;
          System.out.println(tag.getImageURL());
          }
Anish
  • 1
  • 3

1 Answers1

0

the second bit of code you tried will work if corrected. The first line has the problem:

NodeList img=parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("img"),new HasAttributeFilter("align","right")));

I think I understand how to fix the problem. You don't use parser.extractAllNodesThatMatch(), use parser.parse() and see if that helps.

Here's an example of what I mean:

NodeFilter filter1 = new AndFilter(new TagNameFilter("IMG"), new HasParentFilter(new HasAttributeFilter("id", "featured_story_1"), true));
NodeList list = parser.parse(filter1);

for(int i = 0; i < list.size(); i++)
{
    Node node = list.elementAt(i);
    ImageTag image = (ImageTag)node;
    System.out.println(image.getImageURL());
}

Hope this helps!

chaz
  • 1