1

I'm trying to get Nokogiri to scrape ESPN's site for Jeremy Lin's last game stats, however, the CSS text method is giving me a string without any spaces between the stats.

The string that scraper.get_last_game_stats.text is returning is:

"Sat 11/16vsDENW 122-111326-11.5450-2.0004-6.66747113116Wed 11/13@ PHIL 117-1234910-19.5269-15.6005-6.833512005834Sat 11/9vsLACL 94-107263-7.4290-0.0000-0.0001701156"

I am trying to put spaces between each of the stats however, even when I loop through the main object, putting spaces or dashes between iterations, I can't split the numbers for steals, blocks, points, turnovers and everything else:

class PlayerScraper
  attr_accessor :player_data, :name

  def initialize(url)
    @player_data = Nokogiri::HTML(open(url))
  end

  def get_last_game_stats
    @last_game_stats = @player_data.css('tr[class^="oddrow team-46"]')
  end
end

jlin_url = "http://espn.go.com/nba/player/_/id/4299/jeremy-lin"

scraper = PlayerScraper.new(jlin_url)
scraper.get_last_game_stats.text

Can someone show me a better way of doing this?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
JaTo
  • 2,742
  • 4
  • 29
  • 38

3 Answers3

2

You're walking the rows, but not the contained cells. You need to do both to get the cell's values in a usable form:

require 'open-uri'
require 'nokogiri'

URL = 'http://espn.go.com/nba/player/_/id/4299/jeremy-lin'
doc = Nokogiri::HTML(open(URL))

data = doc.css('tr[class^="oddrow team-46"]').map{ |tr|
  tr.css('td').map(&:text)
}

data
# => [["Sat 11/16",
#      "vsDEN",
#      "W 122-111",
#      "32",
#      "6-11",
#      ".545",
#      "0-2",
#      ".000",
#      "4-6",
#      ".667",
#      "4",
#      "7",
#      "1",
#      "1",
#      "3",
#      "1",
#      "16"],
#     ["Wed 11/13",
#      "@ PHI",
#      "L 117-123",
#      "49",
#      "10-19",
#      ".526",
#      "9-15",
#      ".600",
#      "5-6",
#      ".833",
#      "5",
#      "12",
#      "0",
#      "0",
#      "5",
#      "8",
#      "34"],
#     ["Sat 11/9",
#      "vsLAC",
#      "L 94-107",
#      "26",
#      "3-7",
#      ".429",
#      "0-0",
#      ".000",
#      "0-0",
#      ".000",
#      "1",
#      "7",
#      "0",
#      "1",
#      "1",
#      "5",
#      "6"]]

Looking at the data differently, this outputs it as the rows:

data.each do |row|
  puts row.join(', ')
end
# >> Sat 11/16, vsDEN, W 122-111, 32, 6-11, .545, 0-2, .000, 4-6, .667, 4, 7, 1, 1, 3, 1, 16
# >> Wed 11/13, @ PHI, L 117-123, 49, 10-19, .526, 9-15, .600, 5-6, .833, 5, 12, 0, 0, 5, 8, 34
# >> Sat 11/9, vsLAC, L 94-107, 26, 3-7, .429, 0-0, .000, 0-0, .000, 1, 7, 0, 1, 1, 5, 6

A table is really simple and is something you can create using two nested loops. To later access each cell you need to do the same, walk the rows in a loop, and, inside that loop, walk the cells. That's all the code I wrote does.

See "How to avoid joining all text from Nodes when scraping" also.

Community
  • 1
  • 1
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
1

I think you should read the tr elements and then cycle its HTML content and handle each single td separately, otherwise using the text method and the Rails HTML tags cleanup you are getting a mess from the original data.

marzapower
  • 5,531
  • 7
  • 38
  • 76
1

the text method concats the text of all the the selected nodes. Try something like

scraper.get_last_game_stats.map(&:text)

If you want the tr nodes to be evaluated separately. When I do that w/ the url you point to I get:

["Sat 11/16", "vsDEN", "W 122-111", "32", "6-11", ".545", "0-2", ".000", "4-6", ".667", "4", "7", "1", "1", "3", "1", "16", "Wed 11/13", "@ PHI", "L 117-123", "49", "10-19", ".526", "9-15", ".600", "5-6", ".833", "5", "12", "0", "0", "5", "8", "34", "Sat 11/9", "vsLAC", "L 94-107", "26", "3-7", ".429", "0-0", ".000", "0-0", ".000", "1", "7", "0", "1", "1", "5", "6"]

which I hope looks more like what you are looking for.

Alex.Bullard
  • 5,533
  • 2
  • 25
  • 32