1

Did a lot of research but now im sadly completly stuck.

I need to download a html page: For this I fill out different data in a formular and click the submit button, then I would like to save the responses.

Using Firebug i can see that my data is filled out and send over POST. Unluckily there is one more argument that i cannot trace down how it is generated, and without this last argument I don't get the adequate html response, but an error page.

wget and curl (cookies,useragent,header,referrer) fails without last parameter!

Im not too familiar with jquery and javascript, so I cant really where data comes from, but if the browser knows it I should be able as well!

I found this to be similar, only mine seems harder as the field is generated: wget : get field info before sending post-data

In my case I traced it down to this:

<script type="text/javascript">
$(document).ready(function(){
$.get('/getmyData.asp?str=erServiceXUVC',function(string){
$('#oikuZR').append('<input type="hidden" name="lsXUVp" value="'+ string +'">');
});
});
</script>

And the difference from using a real browser and wget (even with user agent and stuff) is because I cannot acces this value, that shows up if I use the normal browser.

<input type="hidden" value="34928321" name="lsXUVp">

This is exactly the value I need (comparing to firebug POST)! But....

Here my knowledge ends.

  • Cannot find "34928321" in the code
  • "#oikuZR" doesnt show up in the DOM list of firebug, but maybe I'm doing something wrong (there are thousands of entries)
  • Debugging the above script I can see that string=34928321 as local variable but I don't see from where the function is beeing called
  • If I open www.homepage.com/getmyData.asp?str=erServiceXUVC (with or without params) there is just an error page.

Thought about using Splash as proxy to run the javascript for me, and than wget that page, but since I dont understand the mechanics yet Im doubtful.

So what can I do?

Maybe a javascript capable browser that accepts commands from command line: open, fillout, send, saveHtml? Alternatives? Solutions for wget (favorite!)?

Community
  • 1
  • 1
Stefan
  • 87
  • 1
  • 2
  • 7

1 Answers1

1

I think this call is added to prevent (or at least make harder) doing what you are trying to do.

The trick is in this line:

$.get('/getmyData.asp?str=erServiceXUVC',function(string){
   ..
});

This line performs an AJAX request. When the request succeeds, the callback function is called, and the response is passed into the argument string.

So, though this request, a code is generated, which is then posted back with the form. I can't be certain without inspecting the actual environment, but I think the AJAX request depends on the session (probably through a cookie). Without that session, it might generate an error message instead.

So to work around this, you would have to get the page and any cookies that come with it. Then, when requesting the code, use the same cookies, so the server will see that request as in the same session.

This is most likely a trick to prevent abuse of the form by spambots. Therefor, it's hard to guess what other tricks there might be, and what else is needed to get this to work.

GolezTrol
  • 114,394
  • 18
  • 182
  • 210
  • Brilliant! Could you add how to make the ajax request? I saved the cookies+session cookies from a first wget in a file and a second wget should use those with the magic last parameter to get the real data. – Stefan Oct 04 '14 at 19:07
  • From the server's perspective an AJAX request is nothing special. It's just a normal request, only usually the `X-Requested-With` header is added in the request. This might be used by the server to determine whether it's an AJAX request. So using wget, just do a normal request and if necessary, add this header. – GolezTrol Oct 04 '14 at 20:28
  • Worked! Found the way Firefox sends the header for XRequest in the Firebug consol: sended cookies as well and got the result as simple plain textnumber in a html file. At the beginning I forgot that /getmyData.asp is absolute (www.homep.com/getmyData.asp) and not relative to the subpage. But still somehow they hold the data back with wget (I'm already faking every header and POST argument FF sends - very strange/clever that they still can distinguish): Maybe I will have to use iMacros for simple human clicking behaviour or alternativly ParseHub – Stefan Oct 06 '14 at 06:58