1

I am using hpple to parse an HTML document. I followed Ray Wenderlich’s tutorial and have everything working fine for their example file. However, I need to change it up a bit to read a certain HTML file for my friends blog. The file is more complex than the example I have used so far. The relevant part of the file (full uploaded on gist is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<!-- snip -->
<div id="content" class="hfeed">
            <div class="post-21443 post type-post status-publish format-standard hentry category-about-catherine">

      <div class="postdate">
      Apr          <br />
      6            <br />
      2013         
      </div>
    <h2 class="entry-title"><a href="http://catherinepooler.com/2013/04/stampnation-live-retreat-updates/" title="StampNation LIVE Retreat Updates" rel="bookmark">StampNation LIVE Retreat Updates</a></h2>

    <div class="post-info"></div>       <div class="entry-content">
        <p><a href="http://catherinepooler.com/wp-content/uploads/2013/04/IMG_0560.jpg" ><img class="aligncenter size-large wp-image-21444" alt="StampNation LIVE" src="http://catherinepooler.com/wp-content/uploads/2013/04/IMG_0560-450x337.jpg" width="450" height="337" /></a></p> <p>StampNation LIVE is in full swing!  We are having a wonderful time.  I am taking a quick break from stamping and chatting to share a few photos with you.</p> <p>I think my favorite thing in getting ready for the retreat was setting up the Accessory Bar.  Each attendee received a small galvanized bucket with their fully glittered initial on it to fill up at the bar.  Awesome!</p>
<!-- snip -->

There are several of these sections within the file and I need to place all the

<h2 class = "entry-title"> 

(title="StampNation LIVE Retreat Updates") in an array. I have successfully placed the

<div class = "entry-content"> 

into an array by using the XPathQuery //div[@class = 'entry-content']/p. However, I can’t seem to get the title without the code crashing due to an empty array. Obviously my XPathQuery is incorrect. This is what I tried.

//h2[@class = 'entry-title']  (: this crashed :)

//div[@class = 'post-21443.....']//h2[@class = 'entry-title']  (: this crashed too.   ")

Along with a slew of other attempts!

Does anyone have any advice for me? I looked into many SO answers, and the examples that came with hpple, but I can not piece it together.

UPDATE: With Jens help I have changed the query to
NSString *postsXpathQueryString = @"//h2[@class = 'entry-title']/a";

This gets me an array, but I get this error as well now.

2013-04-08 10:26:30.604 HTML[12408:11303] * Terminating app due to uncaught exception 'NSRangeException', reason: '* -[__NSArrayM objectAtIndex:]: index 4 beyond bounds [0 .. 3]' * First throw call stack: (0x210a012 0x1203e7e 0x20ac0b4 0x3852 0x2028fb 0x2029cf 0x1eb1bb 0x1fbb4b 0x1982dd 0x12176b0 0x2706fc0 0x26fb33c 0x2706eaf 0x2372bd 0x17fb56 0x17e66f 0x17e589 0x17d7e4 0x17d61e 0x17e3d9 0x1812d2 0x22b99c 0x178574 0x17876f 0x178905 0x9733ab6 0x181917 0x14596c 0x14694b 0x157cb5 0x158beb 0x14a698 0x2065df9 0x2065ad0 0x207fbf5 0x207f962 0x20b0bb6 0x20aff44 0x20afe1b 0x14617a 0x147ffc 0x1d2d 0x1c55) libc++abi.dylib: terminate called throwing an exception

UPDATE 2

Fixed the error index beyond bounds by putting in an if statement when I reloadData. I get an array in my NSLog, but it is not putting it in my table view. Table view comes up empty!! But no more crash!!!

FINAL UPDATE

It is now working, Jens helped me get the query correct and then I just had to fill in the table view. I had set the array count to 20 because Ray's tut had a zillion entries. My friends blog, only had four! Thanks for all the help.

Douglas
  • 2,524
  • 3
  • 29
  • 44
  • Are you really using that quotes you posted in your examples (the smaller HTML snippets and your queries)? If so, replace them by the correct ones. Your queries look fine otherwise. – Jens Erat Apr 08 '13 at 13:22
  • @JensErat, which quotes are you referring too? The single quote in the query? ‘entry-title’ If so what should they be? double's? – Douglas Apr 08 '13 at 13:28
  • Both - you're not using the standard ASCII quotes (`'` and `"`), but typographic ones (`‘`/`’` and `“`/`”`). Wordpress is known to replace these quotes (with some plugin?), if you copied that code out of some tutorial, maybe wordpress messed up your quotes. – Jens Erat Apr 08 '13 at 14:17
  • @JensErat, I see what you mean. I actually wrote up my question in Pages to get it just right, then copy and pasted here on SO. Pages must have changed the quotes because in Xcode they are the ASCII quotes! Thanks for pointing that out. – Douglas Apr 08 '13 at 14:21
  • You should correct them in your post, then; other will be missleaded, too. This isn't your _full_ XML. Is it possible that your input contains a namespace? Maybe post it to [gist](http://gist.github.com) or somewhere else if possible. – Jens Erat Apr 08 '13 at 14:24
  • @JensErat, I changed the quotes so they are just like they are in the Xcode code. Also, I tried to post the full file on gist. This is the link. https://gist.github.com/anonymous/5337252 – Douglas Apr 08 '13 at 14:37
  • Your document is no valid XML, this _could_ be a problem, but I guess the problem will have been the missing namespace declaration, see my answer. – Jens Erat Apr 08 '13 at 14:53

1 Answers1

1

Problem:

Your document contains namespaces:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">

Solution:

I'm not familiar with hpple nor ObjectiveC, so I can't validate that code I adjusted from on this hpple github issue, but it looks reasonable. I guess all you have to do is change the first parameter to your xpath context variable.

xmlXPathRegisterNs(xpathCtx, [@"xhtml" cString],[@"http://www.w3.org/1999/xhtml" cString]); 

Then, prefix this namespace every time you access an element:

//xhtml:h2[@class = 'entry-title']

If you do not want to use namespaces (and no need to because of having different), you could add the wildcard namespace instead:

//*:h2[@class = 'entry-title']
Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • Jens, I tried both things, but neither worked. the xmlXpathReg... line is no longer available, and when I place the *: in front of h2, it says, unable to parse. Thanks so much for taking the time to look into this for me, I will keep on working on it. – Douglas Apr 08 '13 at 15:07
  • Try removing the namespace manually (save the document and remove the xmlns attribute). Do the problems persist? – Jens Erat Apr 08 '13 at 15:13
  • Jens, I saved the document and removed the xmlns attribute, but now it will not let me parse anything. Instead of getting my data via the url, I am getting it via a file that I added to Xcode. – Douglas Apr 08 '13 at 15:24
  • Jens, thanks so much for all your help. You advice sent me in the correct direction!! I got it working. The query was correct, I was just setting up the tableview incorrectly. But I wouldn't have gotten the query correct with out your help, so thanks!! – Douglas Apr 08 '13 at 19:34