I scraped the rankings table from atpworldtour.com and I'm trying to access the player names.
An example of a row in the table looks like this:
<tr>
<td class="rank-cell">1</td>
<td class="move-cell">
<div class="move-none"></div>
<div class="move-text">
</div>
</td>
<td class="country-cell">
<div class="country-inner">
<div class="country-item">
<img src="/~/media/images/flags/srb.png" alt="SRB" onerror="this.remove()">
</div>
</div>
</td>
<td class="player-cell">
<a href="/en/players/novak-djokovic/d643/overview" data-ga-label="Novak Djokovic">Novak Djokovic</a>
</td>
<td class="age-cell">28</td>
<td class="points-cell">
<a href="/en/players/novak-djokovic/d643/rankings-breakdown?team=singles" data-ga-label="rankings-breakdown">15,785</a>
</td>
<td class="tourn-cell">
<a href="/en/players/novak-djokovic/d643/player-activity?matchType=singles" data-ga-label="player-activity">17</a>
</td>
<td class="pts-cell">1,500</td>
<td class="next-cell">0</td>
</tr>
I tried a few different ways of pulling this information. So far the most success I've had so far is with this:
url = "http://www.atpworldtour.com/en/rankings/singles"
doc = Nokogiri::HTML(open(url))
doc.css("tr").each do |row|
puts row.css("td a")
end
The problem is, there are two other links in each row after the player's name so I get them all lumped together. Player's names are the fourth cell in the table so I tried to pull the fourth cell first and then access the link:
doc.css("tr").each do |row|
cell = row.css("td")[3]
puts cell.css("a").text
end
but that returns the error undefined method 'css' for nil:NilClass
.
Upon further investigation, cell
seemed to be storing ALL the cells with the player names instead of just the one for the current iteration of row
, but when I then tried to iterate through cell
I got the same undefined method
error.
I also tried to solve this problem using XPath:
doc.xpath("//tr").each do |row|
puts row.xpath("/td[3]/a").text
end
but the output is a big area of blank space where the names should be listed.
- Are there any tips about what I'm doing wrong?
- Can anyone point me toward detailed documentation for using CSS/XPath selectors with Nokogiri I'd be grateful.
Everything I've found so far only covers the very basics and I'm having trouble finding information on how to perform more complex operations.
I actually got it working using:
doc.xpath("//tr").each do |row|
puts row.at_css("a").text
end
but any help finding proper documentation/tutorials for using XPath and CSS selectors with Nokogiri would still be great.