0

I am changing a question inserted yesterday, and answered correctly.

I have to parse an html page organized this way:

<li id="list">
  <ul>
    <li>
      <a class="region">Liguria</a>
      <ul>
        <li>
          <a class="city">Genova</a>
        </li>
        <li>
          <a class="city">Savona</a>
        </li>
      </ul>
    </li>
    <li>
      <a class="region">Lazio</a>
      <ul>
        <li>
          <a class="city">Roma</a>
        </li>
      </ul>
    </li>
  </ul>
</li>

I need to extract an object with the regions and the cities, like this:

result = {
  'Liguria': [
    'Genova' , 'Savona',
  ],
  'Lazio': [ 'Roma', ],
};

I am using cheerio from node.js, but I added jquery to the tags since cheerio uses jquery-style selector (AFAIK...).

I have come with this partial solution, which is not working ...

$('li[id="list"] ul li').each(function(i, elem) {
  console.log('region:', $(this).html());
  // work on each li containing the region to get the cities...
  // ???
});

As you can see, I'm quite confused... :-(
Any clue?

Community
  • 1
  • 1
MarcoS
  • 17,323
  • 24
  • 96
  • 174

1 Answers1

2

Given the convenient classes on the regions and cities, I think it can be simpler:

var result = {};
// Loop through regions...
$("#list a.region").each(function() {
  // For this region, create an entry on the result object
  // and get an array of its cities. Note that we have to
  // use .next() to get the UL following the a.region
  var $this = $(this);
  result[$this.text()] = $this.next().find("a.city").map(function() {
    return $(this).text();
  }).get();
});

Live Example:

var result = {};
// Loop through regions...
$("#list a.region").each(function() {
  // For this region, create an entry on the result object
  // and get an array of its cities
  var $this = $(this);
  result[$this.text()] = $this.next().find("a.city").map(function() {
    return $(this).text();
  }).get();
});
document.body.insertAdjacentHTML(
  "beforeend",
  "<pre>" + JSON.stringify(result, null, 2) + "</pre>"
);
<li id="list">
  <ul>
    <li>
      <a class="region">Liguria</a>
      <ul>
        <li>
          <a class="city">Genova</a>
        </li>
        <li>
          <a class="city">Savona</a>
        </li>
      </ul>
    </li>
    <li>
      <a class="region">Lazio</a>
      <ul>
        <li>
          <a class="city">Roma</a>
        </li>
      </ul>
    </li>
  </ul>
</li>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>

I am using cheerio from node.js, but I added jquery to the tags since cheerio uses jquery-style selector (AFAIK...).

Mostly. I haven't used cheerio in a couple of years, but the last time I did, there were oddities like this in each callbacks being a cheerio object (analog of a jQuery object) rather than a raw element. So there were several places in my cheerio code where I'd have this.text() rather than $(this).text(), for instance. You may well have to make edits like that on the above.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • @Curt: Thanks. Just realized when doing the live example. Selectors and because of the selector, the need for `.next`. :-) – T.J. Crowder Sep 11 '15 at 10:31