2

How to down load a file after clicking a download button programatically, and therefore not needing to know the url for the downloading file.

After a file has downloaded a prompt comes up and asks if you'd like to save the file, after pressing 'yes' another prompt asks where you'd like to save the file. So, the file is downloaded first, maybe into a buffer somewhere, after the initial download, the prompts appear.

So, once the button is clicked how do you capture the downloading stream and save it as a file somewhere, without the popup prompts appearing?

(Any method for clicking a button would be fine, the following should be fine.)

procedure TForm1.Button1Click(Sender: TObject);
var
  x: integer;
  ovLinks: OleVariant;
begin
  WebBrowser1.Navigate('The web page');
  //wait for page to down load
  ovLinks := WebBrowser1.OleObject.Document.all.tags('A');
  if ovLinks.Length > 0 then
  begin
    for x := 0 to ovLinks.Length-1 do
      begin
        if Pos('id of button', ovLinks.Item(x).id) > 0 then
        //or if Pos('href of button', ovLinks.Item(x).href) > 0 then
        begin
          ovLinks.Item(x).click;
          Break;
        end;
      end;
  end;
end;

The reason for this question is: the url of a file can not always be found. Eg: At this web site, I couldn't find the url programatically but after pressing the export button, using IE, the file was download into the 'Temporary Internet Files' folder. In the IE 'Temporary Internet Files' folder it has a column 'Internet adress' which shows the url. But in Chrome no such data exists. BUT, at this web site, I can find the url programatically, but when I download the file, by pressing 'here', the file doesn't appear in the IE 'Temporary Internet Files' folder. For other websites, the url can be found in the folder and by finding it programatically, but at other sites the url can not be found either way.

Rob Kennedy
  • 161,384
  • 21
  • 275
  • 467
Peter James
  • 237
  • 1
  • 6
  • 18
  • If you're about [web scraping](http://en.wikipedia.org/wiki/Web_scraping), you should better look at other tools than Delphi and COM objects. For example, consider using a headless browser, such as [PhantomJS](http://phantomjs.org). – Stan Nov 14 '12 at 11:43
  • Web scraping? Thanks for the link. But I prefer to stick with Delphi. The only other languages I know are Android, still a beginner, and a little Javascript. – Peter James Nov 14 '12 at 13:12
  • Can you use [`Embedded Web Browser`](http://www.bsalsa.com/product.html) in your project ? – TLama Nov 14 '12 at 13:39
  • @TLama Hi Tlama, Embedded Web Browser? Why is there so much to learn? I guess so, does it cost anything? I was thinking that because TWebBrowser wraps around IE, just using TWebBrowser would be ok. Is Embedded Web Browser easy to use? – Peter James Nov 14 '12 at 13:45
  • 1
    I was asking rather if you want to use it than if you know it. It's free open source wrapper for Internet Explorer with usage same as `TWebBrowser`, just embedded. For your case embedded by implementing [`IDownloadManager`](http://msdn.microsoft.com/en-us/library/aa753613(v=vs.85).aspx) interface. I can post you an example if you want. – TLama Nov 14 '12 at 13:59
  • Not sure exactly how to interpret this question, but my initial impression is that this smells like a possible duplicate of http://stackoverflow.com/questions/8949965/controlling-file-downloads . My answer would be there. – Glenn1234 Nov 14 '12 at 20:11
  • @Glenn1234, not exactly, the *Export* button on that [`web site`](http://financials.morningstar.com/income-statement/is.html?t=AAPL&ops=clear) calls JavaScript function. – TLama Nov 14 '12 at 20:24
  • @TLama I see, the question is a bit different than it appears. I made a suggestion below. – Glenn1234 Nov 14 '12 at 21:11
  • @Glenn1234, yep, it's based on that question. I have found it easy to push buttons on any web page, but not always easy to find the download file url(some people probably prefer not to show it for security reasons). So, my question is a, once and for all, solution to that problem. – Peter James Nov 14 '12 at 23:23

2 Answers2

9

Implement the IDownloadManager interface with its Download method to your web browser control and you can simply control what you need. The Download method is called whenever you're going to download a file (only when the save as dialog pops up).

1. Embedded Web Browser

You can use the Embedded Web Browser control which has this interface already implemented and which fires the OnFileDownload that is different from the same named event in TWebBrowser. See for instance this thread on how to use it.

2. Do it yourself

Another option is that you can implement it to TWebBrowser by yourself. In the following example I've used interposed class just for showing the principle, but it's very easy to wrap it as a component (that's why I've made the OnBeforeFileDownload published).

2.1. OnBeforeFileDownload event

The only extension to TWebBrowser in this interposed class is the OnBeforeFileDownload event which fires when the file is going to be downloaded (before save as dialog pops up, but instead of the OnFileDownload event, not when the document itself is downloaded). If you won't write the event handler for it, the web browser control will behave as before (showing a save as dialog). If you write the event handler and return False to its Allowed declared parameter, the file saving will be cancelled. If you return True to the Allowed parameter (what is by default), the save as dialog will be shown. Note that if you cancel downloading by setting Allowed to False, you'll need to download the file by yourself (as I did synchronously using Indy in this example). For this purpose there's the FileSource constant parameter, which contains the downloaded file URL. Here is the event parameters overview:

  • Sender (TObject) - event sender
  • FileSource (WideString) - source file URL
  • Allowed (Boolean) - declared boolean parameter, which decides if the file download will be allowed or not (default value is True)

2.2. IDownloadManager implementation

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  StdCtrls, OleServer, OleCtrls, Dialogs, ActiveX, MSHTML, UrlMon, SHDocVw,
  IdHTTP;

const
  IID_IDownloadManager: TGUID = '{988934A4-064B-11D3-BB80-00104B35E7F9}';
  SID_SDownloadManager: TGUID = '{988934A4-064B-11D3-BB80-00104B35E7F9}';

type
  IDownloadManager = interface(IUnknown)
    ['{988934A4-064B-11D3-BB80-00104B35E7F9}']
    function Download(pmk: IMoniker; pbc: IBindCtx; dwBindVerb: DWORD;
      grfBINDF: DWORD; pBindInfo: PBindInfo; pszHeaders: PWideChar;
      pszRedir: PWideChar; uiCP: UINT): HRESULT; stdcall;
  end;
  TBeforeFileDownloadEvent = procedure(Sender: TObject; const FileSource: WideString;
    var Allowed: Boolean) of object;
  TWebBrowser = class(SHDocVw.TWebBrowser, IServiceProvider, IDownloadManager)
  private
    FFileSource: WideString;
    FOnBeforeFileDownload: TBeforeFileDownloadEvent;
    function QueryService(const rsid, iid: TGUID; out Obj): HRESULT; stdcall;
    function Download(pmk: IMoniker; pbc: IBindCtx; dwBindVerb: DWORD;
      grfBINDF: DWORD; pBindInfo: PBindInfo; pszHeaders: PWideChar;
      pszRedir: PWideChar; uiCP: UINT): HRESULT; stdcall;
  protected
    procedure InvokeEvent(ADispID: TDispID; var AParams: TDispParams); override;
  published
    property OnBeforeFileDownload: TBeforeFileDownloadEvent read FOnBeforeFileDownload write FOnBeforeFileDownload;
  end;

type
  TForm1 = class(TForm)
    Button1: TButton;
    WebBrowser1: TWebBrowser;
    FileSourceLabel: TLabel;
    FileSourceEdit: TEdit;
    ShowDialogCheckBox: TCheckBox;
    procedure Button1Click(Sender: TObject);
    procedure FormCreate(Sender: TObject);
  private
    procedure BeforeFileDownload(Sender: TObject; const FileSource: WideString;
      var Allowed: Boolean);
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

{ TWebBrowser }

function TWebBrowser.Download(pmk: IMoniker; pbc: IBindCtx; dwBindVerb,
  grfBINDF: DWORD; pBindInfo: PBindInfo; pszHeaders, pszRedir: PWideChar;
  uiCP: UINT): HRESULT;
var
  Allowed: Boolean;
begin
  Result := E_NOTIMPL;
  if Assigned(FOnBeforeFileDownload) then
  begin
    Allowed := True;
    if pszRedir <> '' then
      FFileSource := pszRedir;
    FOnBeforeFileDownload(Self, FFileSource, Allowed);
    if not Allowed then
      Result := S_OK;
  end;
end;

procedure TWebBrowser.InvokeEvent(ADispID: TDispID; var AParams: TDispParams);
begin
  inherited;
  // DispID 250 is the BeforeNavigate2 dispinterface and to the FFileSource here
  // is stored the URL parameter (for cases, when the IDownloadManager::Download
  // won't redirect the URL and pass empty string to the pszRedir)
  if ADispID = 250 then
    FFileSource := OleVariant(AParams.rgvarg^[5]);
end;

function TWebBrowser.QueryService(const rsid, iid: TGUID; out Obj): HRESULT;
begin
  Result := E_NOINTERFACE;
  Pointer(Obj) := nil;
  if Assigned(FOnBeforeFileDownload) and IsEqualCLSID(rsid, SID_SDownloadManager) and
    IsEqualIID(iid, IID_IDownloadManager) then
  begin
    if Succeeded(QueryInterface(IID_IDownloadManager, Obj)) and
      Assigned(Pointer(Obj))
    then
      Result := S_OK;
  end;
end;

{ TForm1 }

procedure TForm1.Button1Click(Sender: TObject);
var
  HTMLWindow: IHTMLWindow2;
  HTMLDocument: IHTMLDocument2;
begin
  WebBrowser1.Navigate('http://financials.morningstar.com/income-statement/is.html?t=AAPL&ops=clear');
  while WebBrowser1.ReadyState <> READYSTATE_COMPLETE do
    Application.ProcessMessages;

  HTMLDocument := WebBrowser1.Document as IHTMLDocument2;
  if not Assigned(HTMLDocument) then
    Exit;
  HTMLWindow := HTMLDocument.parentWindow;
  if Assigned(HTMLWindow) then
  try
    HTMLWindow.execScript('SRT_stocFund.Export()', 'JavaScript');
  except
    on E: Exception do
      ShowMessage(E.Message);
  end;
end;

procedure TForm1.FormCreate(Sender: TObject);
begin
  ReportMemoryLeaksOnShutdown := True;
  WebBrowser1.OnBeforeFileDownload := BeforeFileDownload;
end;

procedure TForm1.BeforeFileDownload(Sender: TObject; const FileSource: WideString;
  var Allowed: Boolean);
var
  IdHTTP: TIdHTTP;
  FileTarget: string;
  FileStream: TMemoryStream;
begin
  FileSourceEdit.Text := FileSource;
  Allowed := ShowDialogCheckBox.Checked;
  if not Allowed then
  try
    IdHTTP := TIdHTTP.Create(nil);
    try
      FileStream := TMemoryStream.Create;
      try
        IdHTTP.HandleRedirects := True;
        IdHTTP.Get(FileSource, FileStream);
        FileTarget := IdHTTP.URL.Document;
        if FileTarget = '' then
          FileTarget := 'File';
        FileTarget := ExtractFilePath(ParamStr(0)) + FileTarget;
        FileStream.SaveToFile(FileTarget);
      finally
        FileStream.Free;
      end;
    finally
      IdHTTP.Free;
    end;
    ShowMessage('Downloading finished! File has been saved as:' + sLineBreak +
      FileTarget);
  except
    on E: Exception do
      ShowMessage(E.Message);
  end;
end;

end.

2.3. IDownloadManager project

You can download the above code (written in Delphi 2009) as a complete project from here.

TLama
  • 75,147
  • 17
  • 214
  • 392
  • 2
    If I could give extra points for an excellent answer I'd give them for this one. – Wouter van Nifterick Nov 15 '12 at 00:16
  • Wow, it works. Genius. I'm still trying to get WebBrowser1BeforeNavigate2 to work for me. But after that I'll down load and install 'Embedded Web Browser'. Not to be picky, but the file is saved as an .html file, but once the .html is changed to .csv it opens perfectly in excel. Next I have to try and understand it all. I can only say Genius to you dude. – Peter James Nov 15 '12 at 03:11
  • Glad it helped! I know about that weakness, but I couldn't figure out how to get the file extension from `IDownloadManager::Download` method for a given URL (even Embedded Web Browser's download manager demo takes it so). The original save as dialog knows that, so I think it's known that time. – TLama Nov 15 '12 at 03:18
  • @TLama It's easy to fix after the download anyway, maybe I shouldn't have mentioned it. – Peter James Nov 15 '12 at 04:04
  • @TLama I have 3 problems: When I compile EmbeddedWB, '{$VARPROPSETTER ON}' and 'MSHTML_EWB' are errors. And in Delphi, 'ReportMemoryLeaksOnShutdown' and URL in 'FileTarget := IdHTTP.URL.Document;' are errors. I'm useing Delphi 6, so maybe D6 is the problem. – Peter James Nov 15 '12 at 04:51
  • *Are errors* tells me exactly nothing about what problems do you have, but well, the `IdHTTP.URL.Document` is missing in Indy 9. That `IdHTTP.URL.Document` is just for getting the document file name, what is now wrong anyway (I'll try to find out if there's a way to get the file name at `IDownloadManager::Download` level). The `ReportMemoryLeaksOnShutdown` is FastMM stuff; not yet in Delphi 6. Simply remove that line. And finally the `EmbeddedWB` compilation failure, there's a comment in source code *You need to download and install the second D6 patch in order for this to compile.*. – TLama Nov 15 '12 at 05:09
  • @TLama I kind of got it to work. It down loads the file, but with no extensions on it. It down loads the file and calls it 'File' and nothing else. I'll down load the second D6 patch and try again. But it's still amazing. – Peter James Nov 15 '12 at 13:49
0

I don't know if this will get you where you need to go, but it seems promising. With the TWebBrowser I have here (exported from "Microsoft Internet Controls version 1.1"), you can use the OnBeforeNavigate2 event to monitor all the URLs the web browser handles. The problem you have from there would be to determine what you need to do, capture the URL, and then handle it yourself. Here's a short example from the five minutes I was playing with the control on the first web site you presented.

procedure TForm1.WebBrowser1BeforeNavigate2(Sender: TObject;
     pDisp: IDispatch; var URL, Flags, TargetFrameName, PostData,
     Headers: OleVariant; var Cancel: WordBool);
  begin
    Edit1.Text := String(URL);
    if Pos('CSV', Edit1.Text) > 0 then
      Cancel := true;
  end;

As you can see, there's a lot of parms and you'd have to locate the documentation to see what those mean. But in my short example, what I do is put the navigated URLs to the Edit1.Text (probably better a TMemo if you really want to watch what is going on). Given your example, there's really nothing to indicate it's a directly downloaded file, but using the code above, I can cancel the browser from doing it's thing (show the download prompts, etc), and then have the URL there in the Edit1 box to act upon. If one were to dig further, I'm sure you can look at the headers in question and determine if the web site intends to send you a file that you should be downloading, since the URL in and of itself doesn't say "CSV file" (putting http://financials.morningstar.com/ajax/ReportProcess4CSV.html?t=AAPL&region=usa&culture=us_EN&reportType=is&period=12&dataType=A&order=asc&columnYear=5&rounding=3&view=raw&productCode=USA&r=809199&denominatorView=raw&number=3 into a web browser will download the CSV file in question).

Hopefully it's a good start for you.

Glenn1234
  • 2,542
  • 1
  • 16
  • 21
  • Hi Glen, thanks for the file link, I had also found in the temporary files folder of IE. And, I'll start experimenting with WebBrowser1BeforeNavigate2. Thanks again. – Peter James Nov 14 '12 at 23:29
  • Hi again, For some reason the WebBrowser1BeforeNavigate2 isn't being called automatically. I put procedure WebBrowser1BeforeNavigate2(etc) in the 'type' section at the top of the unit, and put it after implementation, but it's not working. Any ideas. Also, TLama posted a solution, but before I go through it, I'd like to get WebBrowser1BeforeNavigate2 to work. Thanks for your time. – Peter James Nov 15 '12 at 02:44