I have to parse an html page organized this way:
<li id="ctl00_EFG" class="current">
<a id="ctl00_SGB" href="http://SGI/EFG">EFG</a>
<ul style="width:535px;">
<li class="top_border">
<a style='color: #d94129; font-weight: bold;' href="http://SGI/EFG/regione-abruzzo" title="EFGAbruzzo">Abruzzo</a>
<ul style="width:100%;">
<li>
<a href="http://SGI/EFG/chieti" title="EFG chieti" rel="nofollow">Chieti</a>
</li>
<li>
<a href="http://SGI/EFG/pescara" title="EFG pescara" rel="nofollow">Pescara</a>
</li>
</ul>
</li>
<li class="top_border"><a style='color: #d94129; font-weight: bold;' href="http://SGI/EFG/regione-valdaosta" title="EFGValDAosta">Val d'Aosta</a>
<ul style="width:100%;">
<li>
<a href="http://SGI/EFG/aosta" title="EFG aosta" rel="nofollow">Aosta</a>
</li>
</ul>
</li>
</ul>
</li>
I need to extract an object with the regions and the cities, like this:
{
"Abruzzo": [
"Chieti" , "Pescara",
],
"Val d'Aosta": [
"Aosta",
],
};
I am using cheerio from node.js, but I added jquery to the tags since cheerio uses jquery-style selector (AFAIK...).
I have come with this partial solution, which is not working ...
$('a[id="ctl00_SGB"]').next().find('ul li').each(function(i, elem) {
var $categoryTop = $(this);
var region = $categoryTop.find('a').first().attr('rel', ':not(nofollow)').text();
console.log('region:', region);
$(elem).find('ul li a').each(function(i, elem2) {
console.log('elem2:', $(elem2).text());
});
Any clue?
P.S.: I am changing a question inserted yesterday, and answered correctly. Unfortunately, I did simplify it a bit too much, so I couldn't use the correct answer to my use case...