0

I need to parse the contents of a website into a tableview in my application. I tried hpple and in some testcases it works. But in my specific case I can't get it to work... HTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html>
   <head>
      <link rel="stylesheet" type="text/css" href="willi.css">
      </link><script src="style.js" type="text/javascript"></script>
      <title>Homepage</title>
   </head>
   <body>
      <a name="oben"/>
         <h1>Date</h1>
         <br />
      <a href="#07.07.2015">07.07.2015</a><br />
      <a href="#07.08.2015">07.08.2015</a><br />
      <a name="07.07.2015">
         <hr />
      </a>
      <p class="page" style="text-align:left">
      <h2>Date Tue, 7.7.2015</h2>
      created: 7.7. 16:35 </p>
      <p class="page" style="text-align:left">
      <table class="F" border-width="3">
         <colgroup>
            <col width="899"/>
         </colgroup>
         <tr class="F">
            <th rowspan="1" class="F">
               ***&nbsp;&nbsp; Version 1&nbsp;&nbsp; ***
            </th>
         </tr>
         <tr class="F">
            <th rowspan="1" class="F"></th>
         </tr>
         <tr class="F">
            <th rowspan="1" class="F">
               Testmessage 1
            </th>
         </tr>
         <tr class="F">
            <th rowspan="1" class="F">
               Testmessage 2
            </th>
         </tr>
         <tr class="F">
            <th rowspan="1" class="F">
               Testmessage 3
            </th>
         </tr>
         <tr class="F">
            <th rowspan="1" class="F"></th>
         </tr>
         <tr class="F">
            <th rowspan="1" class="F">
               Testmessage 4
            </th>
         </tr>
      </table>
      </p>
      <p class="seite" style="text-align:left">
      <h4>List:</h4>
      <table class="k" border-width="3">
         <tr>
            <th width="50">
               Team
            </th>
            <th width="50">
               &nbsp;Name
            </th>
            <th width="50">
               Nr.
            </th>
            <th width="50">
               &nbsp;Mate
            </th>
            <th width="50">
               Spot
            </th>
            <th width="50">
               &nbsp;Map
            </th>
            <th width="150"></th>
         </tr>
         <tr class="k">
            <th rowspan="5" class="k">
               A
            </th>
            <td>
               &nbsp;First
            </td>
            <td>
               3
            </td>
            <td>
               &nbsp;
            </td>
            <td></td>
            <td>
               &nbsp;
            </td>
            <td>
               &nbsp;Test
            </td>
         </tr>
         <tr>
            <td>
               &nbsp;Second
            </td>
            <td>
               4
            </td>
            <td>
               &nbsp;
            </td>
            <td></td>
            <td>
               &nbsp;
            </td>
            <td>
               &nbsp;Test
            </td>
         </tr>
         <tr>
            <td>
               &nbsp;Sie
            </td>
            <td>
               8
            </td>
            <td>
               &nbsp;
            </td>
            <td></td>
            <td>
               &nbsp;
            </td>
            <td>
               &nbsp;Test
            </td>
         </tr>
         <tr>
            <td>
               &nbsp;Sie
            </td>
            <td>
               9
            </td>
            <td>
               &nbsp;
            </td>
            <td></td>
            <td>
               &nbsp;
            </td>
            <td>
               &nbsp;Test
            </td>
         </tr>
         <tr>
            <td>
               &nbsp;Es
            </td>
            <td>
               10
            </td>
            <td>
               &nbsp;
            </td>
            <td></td>
            <td>
               &nbsp;
            </td>
            <td>
               &nbsp;Test
            </td>
         </tr>
         <tr class="k">
            <th rowspan="1" class="k">
               B
            </th>
            <td>
               &nbsp;Red
            </td>
            <td>
               11
            </td>
            <td>
               &nbsp;
            </td>
            <td></td>
            <td>
               &nbsp;
            </td>
            <td>
               &nbsp;Test
            </td>
         </tr>
      </table>
      </p>
      <hr />
      <a name="07.08.2015">
         <hr />
      </a>
      <p class="page" style="text-align:left">
      <h2>Date Thu, 8.7.2015</h2>
      created: 7.7. 16:35 </p>
      <p class="page" style="text-align:left">
      <table class="F" border-width="3">
         <colgroup>
            <col width="899"/>
         </colgroup>
         <tr class="F">
            <th rowspan="1" class="F">
               ***&nbsp;&nbsp; Version 1&nbsp;&nbsp; ***
            </th>
         </tr>
      </table>
      </p>
      <p class="page" style="text-align:left">
      <h4>List:</h4>
      <table class="k" border-width="3">
         <tr>
            <th width="50">
               Team
            </th>
            <th width="50">
               &nbsp;Name
            </th>
            <th width="50">
               Nr.
            </th>
            <th width="50">
               &nbsp;Mate
            </th>
            <th width="50">
               Spot
            </th>
            <th width="50">
               &nbsp;Map
            </th>
            <th width="150"></th>
         </tr>
         <tr class="k">
            <th rowspan="5" class="k">
               C
            </th>
            <td>
               &nbsp;Dnk
            </td>
            <td>
               1
            </td>
            <td>
               &nbsp;
            </td>
            <td></td>
            <td>
               &nbsp;
            </td>
            <td>
               &nbsp;Test
            </td>
         </tr>
         <tr>
            <td>
               &nbsp;Es
            </td>
            <td>
               1
            </td>
            <td>
               &nbsp;
            </td>
            <td></td>
            <td>
               &nbsp;
            </td>
            <td>
               &nbsp;Test
            </td>
         </tr>
         <tr>
            <td>
               &nbsp;Dnk
            </td>
            <td>
               2
            </td>
            <td>
               &nbsp;
            </td>
            <td></td>
            <td>
               &nbsp;
            </td>
            <td>
               &nbsp;Test
            </td>
         </tr>
         <tr>
            <td>
               &nbsp;Esta
            </td>
            <td>
               2
            </td>
            <td>
               &nbsp;
            </td>
            <td></td>
            <td>
               &nbsp;
            </td>
            <td>
               &nbsp;Test
            </td>
         </tr>
         <tr>
            <td>
               &nbsp;SWB
            </td>
            <td>
               6
            </td>
            <td>
               &nbsp;Naau
            </td>
            <td>
               F
            </td>
            <td>
               &nbsp;Test
            </td>
            <td>
               &nbsp;
            </td>
         </tr>
      </table>
      </p>
      <hr />
   </body>
</html>

The page contains two main elements (<table></table>) with which contents I want to populate my UITableView.

My goal is to have one section per table, and inside each section all contents of the table. The section-headers name should be the "date".

TFHpple *Parser = [TFHpple hppleWithHTMLData:HtmlData];

NSString *XpathQueryString = @"/html/body/a";
NSArray *Nodes = [Parser searchWithXPathQuery:XpathQueryString];

for (TFHppleElement *element in Nodes) {
    NSString *temp = [[element firstChild] content];
    if (temp.length == 10) {
        [Day addObject:temp];
    }
}

In my NSMutableArray *Day I save the dates and this works fine. I get 2 sections with the right names. But as I try to receive the tables contents I can't get it to work... I want something like

tableElement* newElement = [[tableElement alloc] init];
newElement.day = @"07.07.2015";
newElement.team = @"A";
newElement.name = @"First";
newElement.nr = @"3";
newElement.mate = @"";
newElement.spot = @"";
newElement.map = @"";
newElement.status = @"Test";

and then I can store all newElement (s) for date one in one array and all elements for date two in another element.

edit: e.g newElement.day = @"07.07.2015"; of course needs to be something like newElement.day = [[hppleparse firstChild] content];

Jason Aller
  • 3,541
  • 28
  • 38
  • 38

1 Answers1

1

This can be easily achieved with HTMLKit.

Here are few examples of what you can do with it using the HTML you provided:

HTMLDocument *document = [HTMLDocument documentWithString:html];
NSMutableArray *days = [ NSMutableArray array];
NSArray *links = [document querySelectorAll:@"a"];
for (HTMLElement *link in links) {
  if (link.textContent.length == 10) {
    [days addObject:link.textContent];
  }
}

// For example you can:
// Get all <tr> elements that are children of the table with className 'k'
NSArray *tableKRows = [document querySelectorAll:@"table.k > tr"];

// Get all <td> elements that are descendants of the table with className 'k'
NSArray *tableKData = [document querySelectorAll:@"table.k td"];

// Collect content of all <td> elements in `array`
NSMutableArray *array = [NSMutableArray array];
for (HTMLElement *td in tableKData) {
  NSString *content = [td.textContent stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
  [array addObject:content];
}

Let me know if you need any further help.

HTMLKit is a pure Objective-C HTML parser with CSS3 Selectors support. It is not a wrapper around libxml or any other library, but rather a complete WHATWG HTML specification-compliant implementation.

iska
  • 2,208
  • 1
  • 18
  • 37
  • what if my class name contains text with white spaces , like "table table-responsive table-striped table-condensed table-bordered data-table" . How can i achieve above ? – Himan Dhawan Jul 27 '17 at 13:32
  • @HimanDhawan you can use any of the CSS Level 3 selectors, especially the attribute selectors, e.g. table[class~='data-table'] or table[class*='responsive']. Take a look at the table here https://www.w3.org/TR/css3-selectors/#selectors – iska Jul 27 '17 at 18:39
  • @HimanDhawan let me know if you still need any further assistance – iska Jul 27 '17 at 18:42