How can i scrape hacker news (https://news.ycombinator.com/) via x-ray/nodejs?
I would like to get something like this out of it:
[
{title1, comment1},
{title2, comment2},
...
{"‘Minimal’ cell raises stakes in race to harness synthetic life", 48}
...
{title 30, comment 30}
]
There is a news table but i dont know how to scrape it... Each of the stories on the website consists of three columns. These do not have a parent that is unique to them. So the structure looks like this
<tbody>
<tr class="spacer"> //Markup 1
<tr class="athing"> //Headline 1 ('.deadmark+ a' contains title)
<tr class> //Meta Information 1 (.age+ a contains comments)
<tr class="spacer"> //Markup 2
<tr class="athing"> //Headline 2 ('.deadmark+ a' contains title)
<tr class> //Meta Information 2 (.age+ a contains comments)
...
<tr class="spacer"> //Markup 30
<tr class="athing"> //Headline 30 ('.deadmark+ a' contains title)
<tr class> //Meta Information 30 (.age+ a contains comments)
So far i have tried:
x("https://news.ycombinator.com/", "tr", [{
title: [".deadmark+ a"],
comments: ".age+ a"
}])
and
x("https://news.ycombinator.com/", {
title: [".deadmark+ a"],
comments: [".age+ a"]
})
The 2nd approach returns 30 names and 29 comment-couts... I do not see any possibility to map them together as there is no information which of the 30 title's is missing a comment...
Any help appriciated