1

I have been trying to download .pdb files from the Protein Data Bank. I have written the following block of code to extract these files however I the files being downloaded instead contain the web page.

#Sector C - Processing block:
RefinedPDBCodeList = [] #C1
with open('RefinedPDBCodeList') as inputfile:
    for line in inputfile:
         RefinedPDBCodeList.append(line.strip().split(','))

print(RefinedPDBCodeList[0])
['101m.pdb']

import urllib.request      
for i in range(0, 1): #S2 - range(0, len(RefinedPDBCodeList)):
    path=urllib.request.urlretrieve('http://www.rcsb.org/pdb/explore/explore.do?structureId=101m', '101m.pdb')
James
  • 274
  • 4
  • 12

3 Answers3

6

It seems you got the base url wrong. Try instead:

urllib.request.urlretrieve('http://files.rcsb.org/download/101M.pdb', '101m.pdb')
Simon Fromme
  • 3,104
  • 18
  • 30
2

BioPython offers a retrieval method PDBList.retrieve_pdb_file. However, that relies on the PDB FTP service. If the FTP port is not opened for some reason (firewall etc.) then you can use this function:

def download_pdb(pdbcode, datadir, downloadurl="https://files.rcsb.org/download/"):
    """
    Downloads a PDB file from the Internet and saves it in a data directory.
    :param pdbcode: The standard PDB ID e.g. '3ICB' or '3icb'
    :param datadir: The directory where the downloaded file will be saved
    :param downloadurl: The base PDB download URL, cf.
        `https://www.rcsb.org/pages/download/http#structures` for details
    :return: the full path to the downloaded PDB file or None if something went wrong
    """
    pdbfn = pdbcode + ".pdb"
    url = downloadurl + pdbfn
    outfnm = os.path.join(datadir, pdbfn)
    try:
        urllib.request.urlretrieve(url, outfnm)
        return outfnm
    except Exception as err:
        print(str(err), file=sys.stderr)
        return None
András Aszódi
  • 8,948
  • 5
  • 48
  • 51
0

The URL has since been updated (though the old URL redirects to the new one, for now):

urllib.request.urlretrieve('https://files.rcsb.org/download/101M.pdb', '101m.pdb')

See https://www.rcsb.org/pdb/static.do?p=download/http/index.html for a full list of urls for the different downloads available from the RCSB PDB.

dbn
  • 13,144
  • 3
  • 60
  • 86