1

My program takes an user input and searches it through a particular webpage . Further i want it to go and click on a particular link and then download the file present there .

Example :

  1. The webpage : http://www.rcsb.org/pdb/home/home.do
  2. The search Word :"1AW0"
  3. after you search the word on the website it takes you to : http://www.rcsb.org/pdb/explore/explore.do?structureId=1AW0

I want the program to go on the right hand side of the webpage and download the pdb file from the DOWNLOAD FILES option

I have managed to write a program using the mechanize module to automatically search the word however unable to find a way i can click on a link

my code :

import urllib2
import re
import mechanize

br = mechanize.Browser()
br.open("http://www.rcsb.org/pdb/home/home.do")
## name of the form that holds the search text area 
br.select_form("headerQueryForm")

## "q" name of the teaxtarea in the html script
br["q"] = str("1AW0")
response = br.submit()
print response.read() 

any help or any suggestions would help .

Btw i am intermediate programmer in Python and I am trying to learn the Jython module to try make this work .

Thanks in advance

Nodnin
  • 451
  • 2
  • 9
  • 21
  • 1
    If it is only about downloading the pdb file for a given protein, why don't you just use a http.client (or httplib) to download http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=HEREGOESTHEID. (Hover over this link to see it completely) Apparently all download links look exactely the same. – Hyperboreus Dec 09 '12 at 06:30

1 Answers1

1

Here's how I would have done it:

'''
Created on Dec 9, 2012

@author: Daniel Ng
'''

import urllib

def fetch_structure(structureid, filetype='pdb'):
  download_url = 'http://www.rcsb.org/pdb/download/downloadFile.do?fileFormat=%s&compression=NO&structureId=%s'
  filetypes = ['pdb','cif','xml']
  if (filetype not in filetypes):
    print "Invalid filetype...", filetype
  else:
    try:
      urllib.urlretrieve(download_url % (filetype,structureid), '%s.%s' % (structureid,filetype))
    except Exception, e:
      print "Download failed...", e
    else:
      print "Saved to", '%s.%s' % (structureid,filetype)

if __name__ == "__main__":
  fetch_structure('1AW0')
  fetch_structure('1AW0', filetype='xml')
  fetch_structure('1AW0', filetype='png')

Which provides this output:

Saved to 1AW0.pdb
Saved to 1AW0.xml
Invalid filetype... png

Along with the 2 files 1AW0.pdb and 1AW0.xml which are saved to the script directory (for this example).

http://docs.python.org/2/library/urllib.html#urllib.urlretrieve

Ngenator
  • 10,909
  • 4
  • 41
  • 46
  • How can i save this file and retrieve without actually giving it hardcoded location , i mean what if i have to run this program on someone elses computer and had to retrieve the file and compute on it . – Nodnin Dec 09 '12 at 22:46
  • Not sure I understand... Are you asking how to change the location they are downloaded to? – Ngenator Dec 10 '12 at 00:51
  • Yes , if i had to ask for a input and get the file downloaded i would assign it to a variable x = "1AW0" . and use it as str(x) to get the download . the same will be done by the user . but the file will be downloaded on the users PC . i need access to the file since the futher part of the program will have me computing that file .. How can i achieve that ?? – Nodnin Dec 10 '12 at 04:44
  • Well, you can just add it in as another variable and ask the user. The [urlretrieve](http://docs.python.org/2/library/urllib.html#urllib.urlretrieve) function takes the file name as the second argument. Simply append a path to the beginning and it will save the file there instead of the directory of the script. – Ngenator Dec 10 '12 at 05:32