2

So I am trying to write a script to download a picture file with python and I found this def using google but every picture I get it to download comes out "corrupt". Any ideas...

def download(url):
 """Copy the contents of a file from a given URL
 to a local file.
 """
 import urllib
 webFile = urllib.urlopen(url)
 localFile = open(url.split('/')[-1], 'w')
 localFile.write(webFile.read())
 webFile.close()
 localFile.close()

Edit: the code tag didn't retain the indentions very nicely but I can assure you that they are there, that is not my problem.

John
  • 13,197
  • 7
  • 51
  • 101
  • 2
    Why aren't you using `urllib2`? – S.Lott Nov 04 '10 at 00:50
  • No reason, that's just the way the function was when I found it. – John Nov 04 '10 at 00:58
  • Note that if the file is very big, it is very inefficient to read the whole thing into memory like this. As noted below, urllib.urlretrieve() is a much better option. – nealmcb May 04 '11 at 16:17
  • here try this, http://less4us.blogspot.com/2012/06/python-downloader-script-using-urllib.html and tell me if it works for you –  Jun 23 '12 at 16:39

5 Answers5

6

You can simply do

urllib.urlretrieve(url, filename)

and save yourself any troubles.

Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
  • yep, thanks for pointing that out. This is finally what i ended up doing. And just used ntpath.basename to pull the filename off the end of the url – John Nov 04 '10 at 01:19
5

You need to open the local file in binary mode:

localFile = open(url.split('/')[-1], 'wb')

Otherwise the CR/LF characters in the binary stream will be mangled, corrupting the file.

Gintautas Miliauskas
  • 7,744
  • 4
  • 32
  • 34
3

You must include the 'b' flag, if you intend on writing a binary file. Line 7 becomes:

localFile = open(url.split('/')[-1], 'wb')

It is not necessary for the code to work, but in the future you might consider:

  • Importing outside of your functions.
  • Using os.path.basename, rather than string parsing to get the name component of a path.
  • Using the with statement to manage files, rather than having to manually close them. It makes your code cleaner, and it ensures that they are properly closed if your code throws an exception.

I would rewrite your code as:

import urllib
import os.path

def download(url):
 """Copy the contents of a file from a given URL
 to a local file in the current directory.
 """
 with urllib.urlopen(url) as webFile:
  with open(os.path.basename(url), 'wb') as localFile:
   localFile.write(webFile.read())
Zack Bloom
  • 8,309
  • 2
  • 20
  • 27
2

It's coming out corrupt because the function you're using is writing the bytes to the file, as if it was plain text. However, what you need to do is write the bytes to it in binary mode (wb). Here's an idea of what you should do:

import urllib

def Download(url, filename):
  Data = urllib.urlopen(url).read()
  File = open(filename, 'wb')
  File.Write(Data)
  #Neatly close off the file...
  File.flush()
  File.close()
  #Cleanup, for you neat-freaks.
  del Data, File
Peter O.
  • 32,158
  • 14
  • 82
  • 96
xlTobylx
  • 67
  • 2
  • 6
0
import subprocess
outfile = "foo.txt"
url = "http://some/web/site/foo.txt"
cmd = "curl.exe -f -o %(outfile)s %(url)s" % locals()
subprocess.check_call(cmd)

Shelling out may seem inelegant but when you start encountering issues with more sophisticated sites, but curl has a wealth of logic for handling getting you through the barriers presented by web servers (cookies, authentication, sessions, etc.)

wget is another alternative.

Binary Phile
  • 2,538
  • 16
  • 16