8

I am using this simple function to download a file:

function DownloadFile([string]$url, [string]$file)
{
    $clnt = new-object System.Net.WebClient
    Write-Host "Downloading from $url to $file " 
    $clnt.DownloadFile($url, $file)
}

It works fine but the script I am using that calls it can be called many times and at present that can mean downloading the file(s) many times.

How can i modify the function to only download if the file doesn't exist locally or the server version is newer (e.g. the LastModifiedDate on the server is greater than the LastModifiedDate locally)?

EDIT: This is what I've got so far, seems to work but would like not to have 2 calls to the server.

function DownloadFile([string]$url, [string]$file)
{
    $downloadRequired = $true
    if ((test-path $file)) 
    {
        $localModified = (Get-Item $file).LastWriteTime 
        $webRequest = [System.Net.HttpWebRequest]::Create($url);
        $webRequest.Method = "HEAD";
        $webResponse = $webRequest.GetResponse()
        $remoteLastModified = ($webResponse.LastModified) -as [DateTime] 
        $webResponse.Close()

        if ($remoteLastModified -gt $localModified)
        {
            Write-Host "$file is out of date"
        }
        else
        {
            $downloadRequired = $false
        }

    }

    if ($downloadRequired)
    {
        $clnt = new-object System.Net.WebClient
        Write-Host "Downloading from $url to $file"
        $clnt.DownloadFile($url, $file)
    }
    else
    {
        Write-Host "$file is up to date."
    }
}
Mark
  • 2,926
  • 3
  • 28
  • 31
  • The first part would be easy. Could just delete `$file` at the beginning of the script and use `Test-Path $file` to see if its already there. As for the second part i dont know how to check file details without first downloading the file – Matt Oct 23 '14 at 18:15
  • Does the server you're downloading from expose the file version through an API somewhere? File version can be checked with `(get-item).LastWriteTime` [link](http://blogs.technet.com/b/heyscriptingguy/archive/2012/06/01/use-powershell-to-modify-file-access-time-stamps.aspx) – thisguy123 Oct 23 '14 at 18:15
  • @Matt downloading the file again works now even if the file exists. I am hoping to avoid re-downloading (so deleting wont help this). – Mark Oct 23 '14 at 20:16
  • @thisguy123 I may not have been clear. I am not talking about file version, just the updated date. I know I can get local UpdatedDate, I was hoping I could maybe send the local date and if teh server returned a 304 then not download it. I was further hoping that this wouldn't involve having to check for a 304 (just have the client handle it somehow). – Mark Oct 23 '14 at 20:20
  • I know. The point i was trying to make is you could test if the file was already there before downloading it to save the effort since, in theory, you already had it. That would satisfy your first question. The second part would only work if there was a way to get metadata from the file _without_ download it as @thisguy123 was trying to suggest. – Matt Oct 23 '14 at 20:20
  • @Matt ah, OK. Trouble is then it may be newer on the server, but you figured that out already :) Thanks for your input. – Mark Oct 23 '14 at 20:22
  • if you can get a checksum of the file before downloading it you could compare the checksum and if it differs download the file – Paul Oct 23 '14 at 20:30

3 Answers3

11

I've been beating this up this week, and came up with this

# ----------------------------------------------------------------------------------------------
# download a file
# ----------------------------------------------------------------------------------------------
Function Download-File {
    Param (
        [Parameter(Mandatory=$True)] [System.Uri]$uri,
        [Parameter(Mandatory=$True )] [string]$FilePath
    )

    #Make sure the destination directory exists
    #System.IO.FileInfo works even if the file/dir doesn't exist, which is better then get-item which requires the file to exist
    If (! ( Test-Path ([System.IO.FileInfo]$FilePath).DirectoryName ) ) { [void](New-Item ([System.IO.FileInfo]$FilePath).DirectoryName -force -type directory)}

    #see if this file exists
    if ( -not (Test-Path $FilePath) ) {
        #use simple download
        [void] (New-Object System.Net.WebClient).DownloadFile($uri.ToString(), $FilePath)
    } else {
        try {
            #use HttpWebRequest to download file
            $webRequest = [System.Net.HttpWebRequest]::Create($uri);
            $webRequest.IfModifiedSince = ([System.IO.FileInfo]$FilePath).LastWriteTime
            $webRequest.Method = "GET";
            [System.Net.HttpWebResponse]$webResponse = $webRequest.GetResponse()

            #Read HTTP result from the $webResponse
            $stream = New-Object System.IO.StreamReader($webResponse.GetResponseStream())
            #Save to file
            $stream.ReadToEnd() | Set-Content -Path $FilePath -Force 

        } catch [System.Net.WebException] {
            #Check for a 304
            if ($_.Exception.Response.StatusCode -eq [System.Net.HttpStatusCode]::NotModified) {
                Write-Host "  $FilePath not modified, not downloading..."
            } else {
                #Unexpected error
                $Status = $_.Exception.Response.StatusCode
                $msg = $_.Exception
                Write-Host "  Error dowloading $FilePath, Status code: $Status - $msg"
            }
        }
    }
}
mklement0
  • 382,024
  • 64
  • 607
  • 775
Christopher G. Lewis
  • 4,777
  • 1
  • 27
  • 46
  • I used your code a while ago, and it's still working well. I have a similar situation now though where I am using async bitstransfer to download larger files, but would like to still check for "modified since" and can't see anything in the documentation about that for bitstransfer. In the code above it will stream the file if it has been "modified since", but in my case I just want to read the header and be done with it. Is that possible? – Jarrod McGuire Apr 10 '18 at 14:27
  • I'm not familiar with BITS but you might be able to use a .Method = "HEAD" and the ifmodifiedsince and start your Start-BitsTransfer if you don't throw a 304 – Christopher G. Lewis Apr 25 '18 at 20:01
  • 1
    Neither code path records the `Last-Modified` header sent by the server; it'd be prudent to store it in the downloaded file's `LastWriteTime` property or externally. Otherwise, suppose on Wednesday you download a static `.txt` file modified on Monday. Then, on Thursday, an overwhelmed server administrator finally gets around to moving a newer version of the file, modified Tuesday, to the backing filesystem. When your code runs on Friday it won't download Tuesday's version of the file because it assumes any updates must be newer than the time of the last download (Wednesday). – Lance U. Matthews Oct 18 '22 at 20:45
1

Last modified is in the HTTP response headers.

Try this:

$clnt.OpenRead($Url).Close();
$UrlLastModified = $clnt.ResponseHeaders["Last-Modified"];

If that's newer than the date on your file, your file is old.

The remote server doesn't have to respond with an accurate date or with the file's actual last modified date, but many will.

GetWebResponse() might be a better way to do this (or more correct way). Using OpenRead() and then Close() immediately afterwards bothers my sensibilities, but I may be crazy. I do mostly work on databases.

Bacon Bits
  • 30,782
  • 5
  • 59
  • 66
0
# If the local directory exists and it gets a response from the url,
# it checks the last modified date of the remote file. If the file
# already exists it compares the date of the file to the file from
# the url. If either the file doesn't exists or has a newer date, it
# downloads the file and modifies the file's date to match.

function download( $url, $dir, $file ) {
 if( Test-Path $dir -Ea 0 ) {
  $web = try { [System.Net.WebRequest]::Create("$url/$file").GetResponse() } catch [Net.WebException] {}
  if( $web.LastModified ) {
   $download = 0
   if(-Not(Test-Path "$dir\$file" -Ea 0)) { $download = 1 }
   elseif((gi "$dir\$file").LastWriteTime -ne $web.LastModified) { $download = 1 }
   if( $download ) {
    Invoke-WebRequest "$url/$file" -OutFile "$dir\$file" | Wait-Process
    (gi "$dir\$file").LastWriteTime = $web.LastModified
   }
   $web.Close()
  }
 }
}
download "https://website.com" "$env:systemdrive" "file.txt"
FHC
  • 71
  • 7
  • This is a step back not only from the code in [the accepted answer](https://stackoverflow.com/a/30129694/150605) from 5 years prior — which uses a conditional `GET` to accomplish this logic with a single request — but also the code in the question itself — which uses a `HEAD` request to determine if a `GET` request to perform the download is necessary. Instead, this sends one `GET` request to determine if it should send a second, so, essentially, initiating a file download to determine if it should initiate a file download. That, of course, is inefficient. – Lance U. Matthews Oct 18 '22 at 22:12
  • Thanks for commenting. I was new to Powershell when I posted this and the reason I posted it was that I couldn't get either of the two above methods to work and was just hoping to find anyway to accomplish the posts initial title question without concerns regarding time or amount of calls. Looking at it now I see what you are saying and thanks for pointing it out. – FHC Oct 20 '22 at 16:29