3

I'm try send a Get request to website. The problem is that website is recongize if the requester is a robot

const _URL = 'https://www.URL.com/';
var
  sSessionID:String;
  Params: TStringList;
  IdSSL: TIdSSLIOHandlerSocketOpenSSL;
begin
  IdSSL := TIdSSLIOHandlerSocketOpenSSL.Create(IdHTTP1);
  try
    IdHTTP1.IOHandler := IdSSL;
    IdHTTP1.AllowCookies := True;
    IdHTTP1.HandleRedirects := True;
    IdHTTP1.Request.UserAgent := 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0';
    IdHTTP1.Request.Accept := 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
    IdHTTP1.Request.AcceptLanguage := 'en-GB,en;q=0.5';
    IdHTTP1.Request.Connection := 'keep-alive';
    IdHTTP1.Request.ContentType := 'application/x-www-form-urlencoded';
    sSessionID := IdHTTP1.Get(_URL);
    {....
        extracting SessionID
            Params.Add('SessionID=' + 'sSessionID');
                IdHTTP1.Post(_URL, Params);
                    .....}
  finally
    IdSSL.Free;
  end; 

The result of the IdHTTP.get is <!DOCTYPE html><head><META NAME="ROBOTS"..... Its empty i can't obtin the session ID.

The http request headers is the same what my borwser sent.

RepeatUntil
  • 2,272
  • 4
  • 32
  • 57

1 Answers1

3

As I can have the real URL this is my best guess:

uses
  Math;
...
    const
      _URL = 'https://www.url.com/';
    var
      sSessionID: string;
      Params: TStringList;
      IdSSL: TIdSSLIOHandlerSocketOpenSSL;
    begin
      IdSSL := TIdSSLIOHandlerSocketOpenSSL.Create(IdHTTP1);
      try
        IdHTTP1.IOHandler := IdSSL;
        IdHTTP1.AllowCookies := True;
        IdHTTP1.HandleRedirects := True;
        IdHTTP1.Request.CustomHeaders.AddValue('X-Forwarded-For', Format('%d.%d.%d.%d', [Random(255), Random(255), Random(255), Random(255)]));
        IdHTTP1.Request.UserAgent := Format('Mozilla/%d.0 (Windows NT %d.%d; rv:2.0.1) Gecko/20100101 Firefox/%d.%d.%d', [RandomRange(3, 5), RandomRange(3, 5), Random(2), RandomRange(3, 5), Random(5), Random(5)]);
        IdHTTP1.Request.Accept := 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
        IdHTTP1.Request.AcceptLanguage := 'en-GB,en;q=0.5';
        IdHTTP1.Request.Connection := 'keep-alive';
        IdHTTP1.Request.ContentType := 'application/x-www-form-urlencoded';
        sSessionID := IdHTTP1.Get(_URL);
    ...
      finally
        ...
      end;
Jens Borrisholt
  • 6,174
  • 1
  • 33
  • 67
  • Can you explain what change you made and why you think it will work? – Rob Kennedy Nov 09 '15 at 13:37
  • @jens-borrisholt its worked only once or twice i'm not sure then after that result back shows as robot. – RepeatUntil Nov 09 '15 at 13:44
  • @RobKennedy as you probably can see I put a fake ip-adress (X-Forwarded-For) and UserAgent into the HTTP header. That will fool some sites. fx. IMDB.com and Youtube.com. Bu as I wrote in my answer it is a guess since I can not have the real URL – Jens Borrisholt Nov 09 '15 at 14:05
  • Tested again with a different IP it allow.. it sames on each IP you have 5 request allowed to send through robot then that page will appears..when i tested your answer i have already changed my IP for another reason. – RepeatUntil Nov 09 '15 at 14:07
  • @AbdulrahmanAljehani as I said with the real URL i can not do any more. – Jens Borrisholt Nov 09 '15 at 14:08
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/94612/discussion-between-abdulrahmanaljehani-and-jens-borrisholt). – RepeatUntil Nov 09 '15 at 14:11
  • Yes, I could detect the physical changes to the code (although you shouldn't take that for granted), but what I really meant was for you to explain the *significance* of your changes. *Why* does it fool some sites? Does it really need to be random? Does the user agent really need to change? If so, how do we know that Abdulrahman's sporadic success from your code isn't entirely due to occasional random user agents that the server prefers, and other random values the server dislikes? Without explanation, I fear no actual *learning* has occurred here. – Rob Kennedy Nov 09 '15 at 15:05