2

I subscribe to a secure https web page containing a button that downloads some data as csv. I am trying to automate the download without the 'save as' dialog appearing but always seem to get an empty file downloaded. I suspect it has something to do with file type I'm using with IdHttp as most of my code works correctly. Please can anyone help with my use of IdHttp or see where else I am going wrong?

The download button on the site calls some javascript to perform the download as follows

<a class="dlCSV" href="javascript:void(0);"   onclick="dl_module.DownloadCsv();return false;">Download in CSV format…</a>

In Delphi I use a TWeb browser to log on securely and navigate to the page.

Clicking the download button in the TwebBrowser by hand shows the 'save as' dialog and then correctly downloads the csv data, defaulting to the filename 'data.csv'.

Automating clicking the button using execScript (below) also works, again showing the 'save as' dialog and correctly downloading the data with the same default filename.

procedure TForm1.BtnClickDownloadbuttonClick(Sender: TObject);
var  TheDocument : IHTMLDocument2;  // current HTML document
     HTMLWindow: IHTMLWindow2; // parent window of current HTML document
begin
TheDocument := WebBrowser1.Document as IHTMLDocument2; // Get reference to current document
if not Assigned(TheDocument) then
    Exit;

HTMLWindow := TheDocument.parentWindow;      // Get parent window of current document
if  Assigned(HTMLWindow) then
  try
    HTMLWindow.execScript('dl_module.DownloadCsv()', 'JavaScript'); // execute JS function to do download
  except  
  on E : Exception do
     begin
       showmessage ('Exception class name = '+E.ClassName+ slinebreak
                 +  'Exception message = '+E.Message);
      end  //on E
  end;
end;

Then I added TLama's code from here How do I keep an embedded browser from prompting where to save a downloaded file? to use IDownloadManager to intercept the download and prevent the 'save as' dialog. This is where it seems to go wrong as I then get an empty file downloaded, and not with the name data.csv.

My code for function TWebBrowser.Download, TWebBrowser.InvokeEvent, function TWebBrowser.QueryService and TForm1.FormCreate are identical to that provided by TLama in the link above.

My procedure TForm1.Button1Click is the same except that I changed the download function being called to the one on my page by changing the line

HTMLWindow.execScript('SRT_stocFund.Export()', 'JavaScript');

to

HTMLWindow.execScript('dl_module.DownloadCsv()', 'JavaScript');

and my procedure TForm1.BeforeFileDownload is identical except that because I'm on a secure site I added the variable

var
    LHandler: TIdSSLIOHandlerSocketOpenSSL; //<< on a https site    

and after creating the Filestream I added the lines

LHandler := TIdSSLIOHandlerSocketOpenSSL.Create(nil); 
IdHTTP.IOHandler := LHandler; 

The issue seems to be in procedure TForm1.BeforeFileDownload where I note that the value of FileSource is https://www.the_web_site_name/Ashx/GenericCSV.ashx.

There is a short delay while IdHTTP.Get(FileSource, FileStream); executes and then a file is created on my hard disc but called 'GenericCSV.ashx' (not data.csv) and the file is zero bytes long and completely empty.

Any ideas why its not downloading the file called data.csv (Do I somehow have to execute GenericCSV.ashx as well? if so how?)

For info here is my version of procedure TForm1.BeforeFileDownload

procedure TForm1.BeforeFileDownload(Sender: TObject;  const FileSource: WideString; var Allowed: Boolean);
var
  IdHTTP: TIdHTTP;
  FileTarget: string;
  FileStream: TMemoryStream;
  LHandler: TIdSSLIOHandlerSocketOpenSSL;  // added as its a https site
begin
  FileSourceEdit.Text := FileSource;
  Allowed := ShowDialogCheckBox.Checked;
  if not Allowed then
  try
    IdHTTP := TIdHTTP.Create(nil);
    try
      FileStream := TMemoryStream.Create;
      LHandler := TIdSSLIOHandlerSocketOpenSSL.Create(nil); //<<< added as its a https site
      IdHTTP.IOHandler := LHandler;    //<<< added as its a https site
      try
        IdHTTP.HandleRedirects := True;
        IdHTTP.Get(FileSource, FileStream);
        FileTarget := IdHTTP.URL.Document;
        if FileTarget = '' then
          FileTarget := 'File';
        FileTarget := ExtractFilePath(ParamStr(0)) + FileTarget;
        FileStream.SaveToFile(FileTarget);
      finally
        FileStream.Free;
      end;
    finally
      IdHTTP.Free;
    end;
    ShowMessage('Downloading finished! File has been saved as:' + sLineBreak +
      FileTarget);
  except
    on E: Exception do
      ShowMessage(E.Message);
  end;
end;
Community
  • 1
  • 1
user3209752
  • 619
  • 2
  • 17
  • 29
  • 1
    Do you have to login somehow to that page to be able to download that CSV? Note that Indy (IdHTTP) does not share cookies (nor any authentication) with WebBrowser so you have to count with that and pass it to your IdHTTP or login with IdHTTP. – smooty86 May 04 '15 at 10:27
  • Yes I have already written code to log in to the site - by automating filling in the login information inside the TWebBrowser and clicking the right buttons etc. and then more code navigates from the home page to the page in question. I don't know if any cookies are being used or if they are how I would use them with Idhttp – user3209752 May 04 '15 at 10:38
  • 1
    That's what I am saying. IdHTTP does NOT use cookies from WebBrowser. You either have to retrieve the cookies from WebBrowser (which is problematic) and pass it into IdHTTP (you have to use CookieManager component) OR you have to login in the IdHTTP (send the same values like WebBrowser using Post method). – smooty86 May 04 '15 at 12:10
  • Thank you, I think I understand. I have to log in IdHttp to the site as well before I do the IdHTTP.Get(FileSource, FileStream). However, I'm not really familiar with using IdHttp though. Do you think you could show a code snippet to show what you mean? I don't use POST to log in to the web site, I find the correct form elements for the user name and passoword input boxes, fill them in programmatically and then programmaticaly click the log in button (which seems to run a bit of code rather than just doing a submit) – user3209752 May 04 '15 at 14:19
  • eg the user name box has the code ) Don't know if thats important for IDHttp. I found this http://stackoverflow.com/questions/12722606/log-in-to-website-from-delphi but don't know what I need to change into what so that I can log IdHttp into my site – user3209752 May 04 '15 at 14:34
  • It would be probably difficult for you. I posted an answer with an example how to get cookies from browser. Then you should be able to download your file. – smooty86 May 04 '15 at 15:00

2 Answers2

0

After you login, you can use this code to retrieve cookies from TWebBrowser

procedure GetHttpOnlyCookie(const AUrl: string; var ACookies: string);
const
  INTERNET_COOKIE_HTTPONLY = 8192;
var
  i: Integer;
  hModule: THandle;
  InternetGetCookieEx: function(lpszUrl, lpszCookieName, lpszCookieData
    : PAnsiChar; var lpdwSize: DWORD; dwFlags: DWORD; lpReserved: pointer)
    : BOOL; stdCall;
  CookieSize: DWORD;
  CookieData: PAnsiChar;
begin
  LoadLibrary('wininet.dll');
  hModule := GetModuleHandle('wininet.dll');
  if (hModule <> 0) then
  begin
    @InternetGetCookieEx := GetProcAddress(hModule, 'InternetGetCookieExA');
    if (@InternetGetCookieEx <> nil) then
    begin
      CookieSize := 1024;
      Cookiedata := AllocMem(CookieSize);
      try
        if InternetGetCookieEx(PAnsiChar(AUrl), nil, Cookiedata, CookieSize, INTERNET_COOKIE_HTTPONLY, nil) then
        begin
          ACookies:=CookieData;
        end;
      finally
        FreeMem(Cookiedata);
      end;
    end;
  end;
end;

Then you just parse your cookies and add them (you have to create CookieManager in IdHTTP first)

IdHTTP1.CookieManager.AddServerCookie();

Then you start your download and it should work if you passed all parameters correctly (unfortunately, it is not possible to find out what your site requires).

smooty86
  • 1,112
  • 7
  • 13
  • I found original topic where I probably found this code long time ago. Credits to that guy. You can find more about cookies in Indy there too. http://stackoverflow.com/questions/13235897/transfer-authentication-from-webbrowser-to-indy-cookiemanager – smooty86 May 04 '15 at 15:02
  • Thank you, You cleary understand all this web stuff a great deal better than I do! I deal with mathematical processing of data. The download is just a means to an end for me so I don't really know what I'm doing with this part, especially with all the majic numbers and constants that seem to be needed. I read the other post but it looks like AddServerCookie() needs Indy 10. I didn't upgrade to 10 as it seems its incompatible with many of the other components I have already. Also at the risk of sounding dumb I don't know how to 'just parse my cookies'. Any help there would be appreciated. – user3209752 May 04 '15 at 15:22
  • You can use CookieManager.AddCookie in any older version of Indy – smooty86 May 04 '15 at 15:38
  • Thank's for the help smooty86 but I've spent all day trying to work out how to parse cookies and get them into IdHttp, as well as how to get IdHttp to log on properly. This is getting far more complex than it's worth, The data I am trying to download is already displayed on the screen, looking like a table but actually made up of loads of DIVs, positioned appropriately. I think I'll get there faster by simply parsing the TWebBrowser.document and saving the data myself. Its going into an SQL database so I could even use SQL move it directly from the page into the db. I'm 90% of the way there. – user3209752 May 05 '15 at 17:18
  • Cookies in CookieData are in the format "name1=value1; name2=value2; ". You just separate it by the delimiter ";" (semicolon) into some TStringList. Now you have "name=value" pairs. Then you just add all the cookies one by one - CookieManager.AddCookie('name=value',domain) – smooty86 May 05 '15 at 18:21
0

Thank you smooty86 but I think its time I gave up trying to doing it this way and simply parse the page I can see. I don't mind trying to understand code and adapting it to my needs but its so much harder trying to follow hints and suggestions when I'm working in the dark and especially don't know what parameters are needed everywhere. (I'm not daft, I've been programming for nearly 30 years and have spent over 4 years developing this particular data processing application but rarely touch web stuff)

However, the progress so far is...

Running your GetHttpOnlyCookie code after a successful login using automated filling in of the fields and clicking the login button returned an empty string so I used this code instead that at least seemed to return something that looked a little similar to your cookie string, ie seveveral strings separated by semicolons, most being name=value. (IdCookieManager1 is connected to IdHttp)

CookieList := Tstringlist.Create ;
try
  CookieList.Delimiter := ';' ;
  document := WebBrowser1.Document as IHTMLDocument2;
  CookieList.DelimitedText := document.cookie;
  for i := 0 to CookieList.Count-1 do
      IdCookieManager1.AddCookie(CookieList[i],LOGIN_URL)
finally
  CookieList.Free;
end;

Then in my original procedure BeforeFileDownload I try to log IdHttp into the site as well using code I adapted from here Log in to website from Delphi and the the cookies held in the cookie manager. Displaying the string returned showed lots of HTML that appeared to represent the oringinal log in page and not the page you see after log in

procedure TFrmInportGrades.BeforeFileDownload(Sender: TObject;  const FileSource: WideString; var Allowed: Boolean);
var
  FileTarget: string;
  FileStream: TMemoryStream;
  request : Tstringlist;
  s : string;
begin
  FileSourceEdit.Text := FileSource;
  Allowed := ShowDialogCheckBox.Checked;
  if not Allowed then
    begin
    try
    FileStream := TMemoryStream.Create;
    IdHTTP.CookieManager := IdCookieManager1;
    s := LogInIdHttp;  //<<<< log in the IdHttp
    showmessage(s); //<<<< debug
    IdHTTP.Get(FileSource, FileStream);
    FileTarget := IdHTTP.URL.Document;
    if FileTarget = '' then
          FileTarget := 'File';
    FileTarget := ExtractFilePath(ParamStr(0)) + FileTarget;
    FileStream.SaveToFile(FileTarget);
    finally
        FileStream.Free;
    end;

    ShowMessage('Downloading finished! File has been saved as:' + sLineBreak +
      FileTarget);
    end;
end;

The login code I used is below but I don't really know what I am doing here or what needs to be put into the Request.Add() parameters. I used 'Inspect element' from firefox to get the name of the user and password boxes and put the correct users name and password after the '=' sign in lines {3} and {4}. In lines {2},{6} and {7} I put the url of the log in site. I've no idea what lines {1}, {2}, {5} do or even if they are correct or necessary

function TFrmInportGrades.LogInIdHttp: string;
var
  Request: TStringList;
  Response: TMemoryStream;
  LHandler: TIdSSLIOHandlerSocketOpenSSL;  // added as its a https site
begin
   Result := '';
   try
     Response := TMemoryStream.Create;
     try
       Request := TStringList.Create;
        try
    {1}    Request.Add('op=login');
    {2}    Request.Add('redirect=https://www.thewebsite.com/Login.aspx' );
    {3}    Request.Add('ctl00$ctl00$Body$Body$loginManager$ctl00$loginEmailInput=usernme');
    {4}    Request.Add('ctl00$ctl00$Body$Body$loginManager$ctl01$passwordInput=password'});
           LHandler := TIdSSLIOHandlerSocketOpenSSL.Create(nil); //<<< added as its a https site
           IdHTTP.IOHandler := LHandler;    //<<< added as its a https site
           IdHTTP.AllowCookies := True;
           IdHTTP.HandleRedirects := True;
     {5}   IdHTTP.Request.ContentType := 'application/x-www-form-urlencoded';
     {6}   IdHTTP.Post('https://www.thewebsite.com/Login.aspx', Request, Response);
     {7}    Result := IdHTTP.Get('https://www.thewebsite.com/Login.aspx');
       finally
        Request.Free;
      end;
    finally
      Response.Free;
    end;
  except
    on E: Exception do
      ShowMessage(E.Message);
  end;
end;

The net result of all this is that I don't get a file created at all now, not even a zero byte one. This all seems very overcomplicated simply to avoid or automate the 'Save As' dialog and is requiring lots of code that I won't be able to maintan afterwards. Unless somebody has a simpler solution I'll just parse what I can see (BTW I tried TEmbeddedWebBrowser but there is so little documentation for it I couldn't see how to make it download correctly. Might try again later.) Thank you for trying to help!

Community
  • 1
  • 1
user3209752
  • 619
  • 2
  • 17
  • 29