0

I'm trying to scrape data from a Korean baseball league website, storing players' stats for each year.

http://www.koreabaseball.com/Record/Player/HitterDetail/Daily.aspx?playerId=79215 (It's in Korean, but I only need the numbers in table below, so it wouldn't matter)

If I pick a year from the dropdown box on the upper-right of the region showing the player's everyday stat, it automatically turns right into the desired page.

I've tried below:

library(httr)
library(rvest)
url <- "http://www.koreabaseball.com/Record/Player/HitterDetail/Daily.aspx?playerId=76249"
baseball <- POST(url, body = 
list("ctl00$ctl00$ctl00$cphContents$cphContents$cphContents$ddlYear" = "2017"),
             encode = "form")

page_2017 <- read_html(content(baseball, as="text", encoding="UTF-8"))

table <- html_nodes(page_2017, "tbody > tr > td")
table_text <- html_text(table)
record <- as.data.frame(matrix(table_text, ncol = 17, byrow = TRUE))

Problem is, I only get the same data from 2017 as below, even when I put other years within the POST function.

V1   V2    V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16   V17
1  03.31 한화 0.333  3  0  1  0  0  0   1   0   0   0   0   2   0 0.333
2  04.01 한화 0.000  4  0  0  0  0  0   0   0   0   0   0   3   0 0.143
3  04.02 한화 0.333  6  0  2  0  0  0   1   0   0   0   0   1   0 0.231
4  04.04   kt 0.400  5  0  2  1  0  0   1   0   0   0   0   1   0 0.278

I wish someone could help me on this matter and would really appreciate that. A general solution regarding dropdown boxes would be the best, but a specific one for this problem would also be appreciated.

lawyeR
  • 7,488
  • 5
  • 33
  • 63
Mons2us
  • 192
  • 1
  • 1
  • 9
  • Please list the package(s) you are using, so far I got `httr` and `xml2`. However, I shouldn't be the one to find out which packages you used. – Erik Schutte May 15 '17 at 09:03
  • @ErikSchutte I updated. I only use rvest and httr. But any other methods are also appreciated! I want to know how this kind of matters can be solved. – Mons2us May 15 '17 at 09:07
  • The reason this doesn't work entirely, but still lets you get data from the year of 2017 is that that is the default setting for that page and the first thing you get when you ask the server to give you the data. I googled and I immediately found some good reference posts on SO. Please visit these [Link1](http://stackoverflow.com/questions/25965785/scrape-values-from-html-select-option-tags-in-r) and [Link2](http://stackoverflow.com/questions/31615435/rvest-extract-option-value-and-text-from-select) links. – Erik Schutte May 15 '17 at 09:17
  • @ErikSchutte Thanks for your links and I read them. But they were about getting ids of drop-down components but not scrapping the data following selection of particular drop-down value. I can't still scrap the data from other than year 2017. – Mons2us May 15 '17 at 09:35
  • 1
    Ah I see, I kinda screwed that up with my google magic. Okay so here IS an answer to your question. [This](http://stackoverflow.com/questions/35633533/web-scrape-select-fields-from-drop-downs-extract-resulting-data) accepted answer should do what you want. – Erik Schutte May 15 '17 at 09:57
  • @ErikSchutte Thanks, but it just doesn't work. I guess it might be the site's problem.. Appreciate your helps anyway. – Mons2us May 15 '17 at 10:28

0 Answers0