1

I am trying to use python and twill to automate a search on the PubMed Database, but right now I'm having problems getting one search to work. My base code looks like this:

from twill.commands import *
go("http://www.pubmed.com")
fv("1","term","macropahge")
showforms()

When I run that, I get this output:

Form name=EntrezForm (#1)
## ## __Name__________________ __Type___ __ID________ __Value__________________
1     None                     select    database     [] of [] 
2     None                     select    database     [] of ['gquery'] 
3     None                     select    database     [] of ['assembly'] 
4     None                     select    database     [] of ['bioproject'] 
5     None                     select    database     [] of ['biosample'] 
6     None                     select    database     [] of ['biosystems'] 
7     None                     select    database     [] of ['books'] 
8     None                     select    database     [] of ['clinvar'] 
9     None                     select    database     [] of ['clone'] 
10    None                     select    database     [] of ['cdd'] 
11    None                     select    database     [] of ['gap'] 
12    None                     select    database     [] of ['dbvar'] 
13    None                     select    database     [] of ['epigenomics'] 
14    None                     select    database     [] of ['nucest'] 
15    None                     select    database     [] of ['gene'] 
16    None                     select    database     [] of ['genome'] 
17    None                     select    database     [] of ['gds'] 
18    None                     select    database     [] of ['geoprofiles'] 
19    None                     select    database     [] of ['nucgss'] 
20    None                     select    database     [] of ['homologene'] 
21    None                     select    database     [] of ['medgen'] 
22    None                     select    database     [] of ['mesh'] 
23    None                     select    database     [] of ['ncbisearch'] 
24    None                     select    database     [] of ['nlmcatalog'] 
25    None                     select    database     [] of ['nuccore'] 
26    None                     select    database     [] of ['omim'] 
27    None                     select    database     [] of ['pmc'] 
28    None                     select    database     [] of ['popset'] 
29    None                     select    database     [] of ['probe'] 
30    None                     select    database     [] of ['protein'] 
31    None                     select    database     [] of ['proteinclusters'] 
32    None                     select    database     [] of ['pcassay'] 
33    None                     select    database     [] of ['pccompound'] 
34    None                     select    database     [] of ['pcsubstance'] 
35    None                     select    database     [] of ['pubmed'] 
36    None                     select    database     [] of ['pubmedhealth'] 
37    None                     select    database     [] of ['snp'] 
38    None                     select    database     [] of ['sra'] 
39    None                     select    database     [] of ['structure'] 
40    None                     select    database     [] of ['taxonomy'] 
41    None                     select    database     [] of ['toolkit'] 
42    None                     select    database     [] of ['toolkitall'] 
43    None                     select    database     [] of ['toolkitbook'] 
44    None                     select    database     [] of ['unigene'] 
45    term                     text      term         macropahge 
46 1  None                     submi ... search        
47    EntrezSystem2.PEntre ... hidden    (None)       home 
48    EntrezSystem2.PEntre ... hidden    (None)        
49    EntrezSystem2.PEntre ... hidden    (None)       pubmed 
50    EntrezSystem2.PEntre ... hidden    (None)       pubmed 
51    EntrezSystem2.PEntre ... hidden    (None)        
52    EntrezSystem2.PEntre ... hidden    (None)        
53    EntrezSystem2.PEntre ... hidden    (None)        
54    EntrezSystem2.PEntre ... hidden    (None)        
55    EntrezSystem2.PEntre ... hidden    (None)        
56    EntrezSystem2.PEntre ... hidden    (None)        
57    EntrezSystem2.PEntre ... hidden    (None)        
58    EntrezSystem2.PEntre ... hidden    (None)        
59    EntrezSystem2.PEntre ... hidden    (None)        
60    EntrezSystem2.PEntre ... hidden    (None)        
61    EntrezSystem2.PEntre ... hidden    (None)        
62    p$a                      hidden    p$a           
63    p$l                      hidden    p$l          EntrezSystem2 
64    p$st                     hidden    p$st         pubmed 
65    SessionId                hidden    SessionId    CE8B4A8E3C997DA1_0124SID 
66    Snapshot                 hidden    Snapshot     /projects/entrez/pubmed/PubMedGroup@1.54 

<generator object __call__ at 0x030B8170>

So I know my code is putting the search term in correctly, but when I submit, it does not work.

submit()
find("macrophage")

Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    find("macrophage")
  File "C:\Users\Ed\AppData\Roaming\Python\Python27\site-packages\twill\commands.py", line 239, in find
    raise TwillAssertionError("no match to '%s'" % (what,))
TwillAssertionError: no match to 'macrophage'

So, I am submitting incorrectly or using the wrong submit box. I know the term macrophage will show up on the page when I search, so something is wrong in the submit step. Any help is appreciated. When I try garbage phrases like ";lkjasdlfkjasd", I expect "No items found" but I do not see that either.

  • Twill does not understand javascript and the PubMed page seems to be using some AJAX. Have you tried to fetch `http://www.ncbi.nlm.nih.gov/pubmed/?term=macrophage` directly? – Paulo Scardine Jul 18 '14 at 22:15
  • Have you considered using biopython to access their API? `Entrez.read(Entrez.esearch(db="pubmed", term="macrophage"))` might get you what you're looking for. – Brad Beattie Jul 18 '14 at 22:31
  • Twill looks like a "half" solution to me. I did a lot of web scraping and I would either build the requests my self using the `requests` library or a tool like `scrapy`. If that's to low level and will not do the job due to Javascript being involved, I would go for automating a real browser using Selenium/WebDriver. – Achim Jul 18 '14 at 22:51
  • Thanks for the comments. I ended up using the biopython, so thank you Brad Beattie. That did the job really well. Paulo Scardine, thanks for the tip to use the direct search. That might have worked as well! – user3854610 Jul 25 '14 at 16:05

0 Answers0