4

I've been trying to scrape this website : http://www.e3050.com/Cases-Fans-PDU/C

I can scrape anything normally but moving to the next page, after debugging I found that they send the __Viewstate parameter for each page request. The viewstate parameter is stored in each page response, so I figured out that I need to get it per page and send it to the following page. I get the __viewstate using this xpath :

sel.xpath('//input[@id="__VIEWSTATE"]/@value').extract()

I also got an error, because the viewstate they send is different than the one enlisted in their page response, both parameters are 64 base encoded but the one they send per request has more data than the one I get from the page response.

how can I deal with this ? and how does they get the viewstate parameter ?

Edit: The same issue for __EVENTVALIDATION parameter.

Vanddel
  • 1,094
  • 3
  • 13
  • 32
  • The viewstate is per page to support a post-back. Do not send the viewstate to a different page. – Emond Sep 25 '14 at 10:39
  • Then how can I get the viewstate for the following page ? – Vanddel Sep 25 '14 at 11:07
  • or how does they generate the viewstate for the request in the first place ? – Vanddel Sep 25 '14 at 11:10
  • For a HTTP-GET of a page you do not need a viewstate, once you have the page it will contain the viewstate. To HTTP-POST that page (e.g. after entering values) you simply post the correct form and its viewstate will flow back to the server. Viewstate is generated on the server side. Seriously: look up documentation on it before guessing its use. – Emond Sep 25 '14 at 11:12
  • 1
    according to what you're saying, I don't have to send a viewstate for a post request, right ? but I get this response when I don't send the viewstate or send the wrong one:

    1|#||4|53|pageRedirect||%2ferror.html%3faspxerrorpath%3d%2fCases-‌​Fans-PDU%2fC|

    – Vanddel Sep 25 '14 at 11:51
  • Please read my comments again: For a POST you must send the ViewState but the proper way to do that is by posting a *Html Form* and that should already contain the viewstate. Again: read the documentation. – Emond Sep 25 '14 at 12:06
  • 1
    I do use the FormRequest of scrapy to request the following page. http://doc.scrapy.org/en/latest/topics/request-response.html#formrequest-objects but the problem is that the viewstate embedded in the previous page (the one I get using this xpath "//input[@id="__VIEWSTATE"]/@value" ) isn't the same one sent in the form post request I check using google chrome developer tools. my question is how to get this viewstate? – Vanddel Sep 25 '14 at 12:14
  • @Vanddel having the same issue trying to "scrap" another site. The view_state on the *.html is different from the one being sent in the post request. – imbr Aug 24 '17 at 17:13

1 Answers1

5

__VIEWSTATE is a parameter used in .net for a security reason. it's basically a hash of the referral page to the page you're requesting. it's usually embedded in an tag, you just need to extract it before each request and add it to your next request.

Vanddel
  • 1,094
  • 3
  • 13
  • 32