2

I am working with this website: http://bioinfo.uni-plovdiv.bg/microinspector/

And from the mech-dump, I get

POST http://bioinfo.uni-plovdiv.bg/microinspector/cgi/result-new1.plx
  target_name=                   (text)
  target_sequence=               (textarea)
  Choose an organism : =Choose an organism: (option)   [*Choose an organism:|-NEMATODES------------------|C. elegans/Caenorhabditis elegans|C. briggsae/Caenorhabditis briggsae|Capitella sp. I|Cerebratulus lacteus|Saccoglossus kowalevskii|Schistosoma japonicum|Schistosoma mansoni|Schmidtea mediterranea|Strongylocentrotus purpuratus|Haliotis rufescens|Lottia gigantea|-PLANTS------------------|Arabidopsis thaliana|Zea mays|Oryza sativa|Sorghum bicolor|-VIRUSES------------------|Rhesus lymphocryptovirus|Epstein Barr virus|Human cytomegalovirus|Kaposi sarcoma-associated herpesvirus|Mouse gammaherpesvirus|BK polyomavirus|Herpes Simplex Virus 1|Herpes Simplex Virus 2|Human immunodeficiency virus 1|JC polyomavirus|Mareks disease virus|Mareks disease virus type 2|Merkel cell polyomavirus|Mouse cytomegalovirus|Mouse gammaherpesvirus 68|Rhesus monkey rhadinovirus|Simian virus 40/Human immunodeficiency virus 1|-VIRIDIPLANTAE------------------|Triticum aestivum|Selaginella moellendorffii|Populus trichocarpa|Pinus taeda|Physcomitrella patens|Arabidopsis thaliana|Glycine max|Medicago truncatula|Oryza sativa|Populus trichocarpa|Saccharum officinarum|Sorghum bicolor|Zea mays|Brassica napus|Brassica oleracea|Carica papaya|Lotus japonicus|Vigna unguiculata|Gossypium herbecium|Gossypium hirsutum|Gossypium rammindii|Solanum lycopersicum|Brassica rapa|Vitis vinifera|-ARTHROPODS------------------|Drosophila melanogaster|Drosophila pseudoobscura|Apis mellifera|Anopheles gambiae|Ixodes scapularis/Ixodes scapularise|Bombyx mori|Drosophila ananassae|Drosophila erecta|Drosophila grimshawi|Drosophila mojavensis|Drosophila persimilis|Drosophila sechellia|Drosophila simulans|Drosophila virilis|Drosophila willistoni|Drosophila yakuba|Locusta migratoria|Tribolium castaneum|-VERTEBRATES------------------|Bos taurus|Xenopus tropicalis|Monodelphis domestica|Lemur catta|Lagothrix lagotricha|Gorilla gorilla|Ateles geoffroyi|Ovis aries|Homo sapiens|Fugu rubripes|Macaca nemestrina|Macaca mulatta|Mus musculus|Canis familiaris|Rattus norvegicus|Rattus norvegicus|Pan paniscus|Pan troglodytes|Pongo pygmaeus|Saguinus labiatus|Saguinus labiatus|Sus scrofa|Gallus gallus|Danio rerio|Xenopus laevis|Tetraodon nigroviridis|Pygathrix bieti|Symphalangus syndactylus|Ornithorhynchus anatinus|Cricetulus griseus|-HORDEATES------------------|Branchiostoma floridae|Ciona intestinalis|Ciona savignyi|Oikopleura dioica|-PROTISTAE-------------------|Chlamydomonas reinhardtii|Dictyostelium discoideum|-OTHER-------------------|Amphimedon queenslandica|Hydra magnipapillata|Nematostella vectensis]
  user_small=                    (textarea)
  temperature=37                 (text)
  mfe=-20                        (text)
  Submit=          Search           (submit)
  Reset=<UNDEF>                  (reset)

POST http://www.aardvarkmailinglist.net/sub/account_manager.php?action=add
  cid=20804                      (hidden readonly)
  cqid=LSI                       (hidden readonly)
  lid=12305                      (hidden readonly)
  sub_email=                     (text)
  Submit=Subscribe               (submit)

This is what I have so far.

use strict;
use warnings;

use WWW::Mechanize;

# create object for browser
my $browser = WWW::Mechanize->new();
my ($sequence, $results);
open (DRG, "<microRNA_target_cspg_drg_output.fa") || die "cannot open microRNA_target_cspg_drg_output.fa";

while (<DRG>) {
        chomp;
        $sequence=$_;
        last; #for testing purposes
}
close (DRG);

$browser->get("http://bioinfo.uni-plovdiv.bg/microinspector/");
$browser->form_number(1);
$browser->field("target_sequence", $sequence);
$browser->field("Choose an organism : ", "Mus musculus");
my $response = $browser->click_button( number => 1);
print $response->content();

I am not sure what to do next; I feel like I am not setting the organism correctly (it is a drop down menu so I need to select) but I don't think I am writing it right in the code.

$browser->field("Choose an organism : ", "Mus musculus");

In addition, once the form is clicked it goes to a new page (url is POST?). Any help is appreciated. Thank you. I get this when I run the above code

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>A New MiRNA Program</title>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251" />
</head>
<body>
Status: 500
Content-type: text/html

<h1>Software error:</h1>
<pre>Illegal division by zero at /usr/local/lib/perl5/site_perl/5.8.9/Bio/Graphics/Panel.pm line 237.
</pre>
<p>
For help, please send mail to the webmaster (<a href="mailto:vebaev@uni-plovdiv.bg">vebaev@uni-plovdiv.bg</a>), giving this error message 
and the time and date of the error.

</p>

The sequence that is being inputted is this for anyone who wants to try it

AAACACACTGGGGAATGGAGCAAGACAGTCTTTGAATATCAAACACGCAAGGCAATGAGACTACCCATCATAGATATCGCACCCTATGACATTGGGGGTCCTGATCAAGAATTTGGTGTGGACATTGGCCCTGTTTGCTTTTTATAAGCCAAACTCTCTGAAACCCCAGCAAAACAAAAACCACATCCATGTGTTCATCTTGTTTTAATCTTATCAACCAGTGCAAGTGACCAACTAAATTCCAGTTATTTATTTCCAAACTTTTGGAAAAAGCATAATTTGACAAAAAAAGAATACAATTTTTTGCTGTTTCAACCACCCAATACAGGTCAAATGCTTTTGTTTTATTTTTTTACCAATTCCAACTTCAAAATGTCTCAATGGTGCTATAATAAATAAACGTCAACACTTTTATGATAA

joaquin
  • 82,968
  • 29
  • 138
  • 152

2 Answers2

1
$browser->field("Choose an organism : ", "Mus musculus");

For clarity, can be:

$browser->select("Choose an organism : ", "Mus musculus");

And you should use the submit method.

$browser->submit();

This:

use strict;
use warnings;

use WWW::Mechanize;

# create object for browser
my $browser = WWW::Mechanize->new();
my ($sequence, $results);

$sequence = <<END;
AAACACACTGGGGAATGGAGCAAGACAGTCTTTGAATATCAAACACGCAAGGCAATGAGACTACCCATCATAGATATCGCACCCTATGACATTGGGGGTCCTGATCAAGAATTTGGTGTGGACATTGGCCCTGTTTGCTTTTTATAAGCCAAACTCTCTGAAACCCCAGCAAAACAAAAACCACATCCATGTGTTCATCTTGTTTTAATCTTATCAACCAGTGCAAGTGACCAACTAAATTCCAGTTATTTATTTCCAAACTTTTGGAAAAAGCATAATTTGACAAAAAAAGAATACAATTTTTTGCTGTTTCAACCACCCAATACAGGTCAAATGCTTTTGTTTTATTTTTTTACCAATTCCAACTTCAAAATGTCTCAATGGTGCTATAATAAATAAACGTCAACACTTTTATGATAA
END

$browser->get("http://bioinfo.uni-plovdiv.bg/microinspector/");
$browser->form_number(1);
$browser->field("target_sequence", $sequence);
$browser->select("Choose an organism : ", "Mus musculus");
#my $response = $browser->click_button( number => 1);
my $response = $browser->submit();
print $response->content();

Returns:

<!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>A New MiRNA Program</title>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251" />
</head>
<body>
<IMG SRC='Bio::Graphics::Panel=HASH(0x1ed39c8)->png'><H1 align=center>RESULTS</H1><TABLE align=center border CELLPADING=10 CELLSPASING=0><THEAD bgcolor=#C0C0C0 NOWRAP align=center><TR> <TD>POSITION</TD><TD>SEQUENCE OF TARGET</TD><TD>NAME 
OF MIRNA</TD><TD>SEQUENCE OF MIRNA</TD><TD>FREE ENERGY</TD><TD>LINK (SEC.STRUCTURE .ps)</TD>
<TBODY><tr align=center 
nowrap><td>54</td><td>ATGAGACTACCCATCATAGATATCGCACCCTA</td><td>mmu-miR-342-5p</td><td>aggggugcuaucugugauugag</td><td>-27.8</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/54seqmmu-miR-342-5p.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>22</td><td>AGACAGTCTTTGAATATCAAACACGCAAGGCA</td><td>mmu-miR-669e</td><td>ugucuugugugugcauguucau</td><td>-27.1</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/22seqmmu-miR-669e.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>97</td><td>GTCCTGATCAAGAATTTGGTGTGGACATTGGC</td><td>mmu-miR-199a-5p</td><td>cccaguguucagacuaccuguuc</td><td>-25.2</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/97seqmmu-miR-199a-5p.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>93</td><td>GGGGGTCCTGATCAAGAATTTGGTGTGGACAT</td><td>mmu-miR-124-star</td><td>cguguucacagcggaccuugau</td><td>-25.2</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/93seqmmu-miR-124-star.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>202</td><td>TTTTAATCTTATCAACCAGTGCAAGTGACCAA</td><td>mmu-miR-150-star</td><td>cugguacaggccugggggauag</td><td>-24.2</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/202seqmmu-miR-150-star.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>128</td><td>CCCTGTTTGCTTTTTATAAGCCAAACTCTCTG</td><td>mmu-miR-1966</td><td>aagggagcuggcucaggagagaguc</td><td>-23.1</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/128seqmmu-miR-1966.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>99</td><td>CCTGATCAAGAATTTGGTGTGGACATTGGCCC</td><td>mmu-miR-1898</td><td>aggucaagguucacaggggauc</td><td>-22.9</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/99seqmmu-miR-1898.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>97</td><td>GTCCTGATCAAGAATTTGGTGTGGACATTGGC</td><td>mmu-miR-199b-star</td><td>cccaguguuuagacuaccuguuc</td><td>-22.8</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/97seqmmu-miR-199b-star.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>30</td><td>TTTGAATATCAAACACGCAAGGCAATGAGACT</td><td>mmu-miR-200c-star</td><td>cgucuuacccagcaguguuugg</td><td>-22.2</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/30seqmmu-miR-200c-star.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>6</td><td>ACTGGGGAATGGAGCAAGACAGTCTTTGAATA</td><td>mmu-miR-743b-5p</td><td>uguucagacugguguccauca</td><td>-21.4</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/6seqmmu-miR-743b-5p.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>68</td><td>CATAGATATCGCACCCTATGACATTGGGGGTC</td><td>mmu-miR-188-5p</td><td>caucccuugcaugguggaggg</td><td>-21.3</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/68seqmmu-miR-188-5p.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>3</td><td>CACACTGGGGAATGGAGCAAGACAGTCTTTGA</td><td>mmu-miR-1981</td><td>guaaaggcugggcuuagacguggc</td><td>-21.1</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/3seqmmu-miR-1981.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>56</td><td>GAGACTACCCATCATAGATATCGCACCCTATG</td><td>mmu-miR-1894-3p</td><td>gcaagggagagggugaagggag</td><td>-20.8</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/56seqmmu-miR-1894-3p.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>35</td><td>ATATCAAACACGCAAGGCAATGAGACTACCCA</td><td>mmu-miR-193-star</td><td>ugggucuuugcgggcaagauga</td><td>-20.6</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/35seqmmu-miR-193-star.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>156</td><td>TCTGAAACCCCAGCAAAACAAAAACCACATCC</td><td>mmu-miR-1188</td><td>uggugugagguugggccagga</td><td>-20.52</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/156seqmmu-miR-1188.ps'>image</A></td></tr>
<tr align=center 
nowrap><td>356</td><td>CAATTCCAACTTCAAAATGTCTCAATGGTGCT</td><td>mmu-miR-680</td><td>gggcaucugcugacauggggg</td><td>-20.3</td><td><A 
HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/356seqmmu-miR-680.ps'>image</A></td></tr>
</TBODY></TABLE></html>
<BR>
<table align=center>
        <tr><td align=center><A align=center HREF='http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/seq20110311051243.csv'>Results in .CSV format
(Right click and 'Save as')</A></td></tr>
        <tr><td><br></td></tr>
        <tr><td><IMG SRC='/microinspector/cgi/postscript/20110311051243/image.png'></td></tr><table></table><BR></FORM>
</body>

EDIT:

A revised script to save the csv files to disk:

use strict;
use warnings;

use WWW::Mechanize;

# create object for browser
my $browser = WWW::Mechanize->new();

my $sequence = <<END;
AAACACACTGGGGAATGGAGCAAGACAGTCTTTGAATATCAAACACGCAAGGCAATGAGACTACCCATCATAGATATCGCACCCTATGACATTGGGGGTCCTGATCAAGAATTTGGTGTGGACATTGGCCCTGTTTGCTTTTTATAAGCCAAACTCTCTGAAACCCCAGCAAAACAAAAACCACATCCATGTGTTCATCTTGTTTTAATCTTATCAACCAGTGCAAGTGACCAACTAAATTCCAGTTATTTATTTCCAAACTTTTGGAAAAAGCATAATTTGACAAAAAAAGAATACAATTTTTTGCTGTTTCAACCACCCAATACAGGTCAAATGCTTTTGTTTTATTTTTTTACCAATTCCAACTTCAAAATGTCTCAATGGTGCTATAATAAATAAACGTCAACACTTTTATGATAA
END

$browser->get("http://bioinfo.uni-plovdiv.bg/microinspector/");
$browser->form_number(1);
$browser->field("target_sequence", $sequence);
$browser->select("Choose an organism : ", "Mus musculus");


$browser->submit();
my @links = $browser->links();

foreach my $link ( @links ){
  if( $link->url() =~ /csv$/i ){
    my $result = $browser->get( $link->url() );
    my $filename = ( $link->url() =~ /\/([^\/]+)$/ )[0];

    print "Saving $filename\n";

    open( OUT, ">$filename" );
    print OUT $result->content();
    close( OUT );
  }
}
moshen
  • 1,114
  • 2
  • 11
  • 18
  • ->field should work according to the documentation at: http://search.cpan.org/~petdance/WWW-Mechanize-1.66/lib/WWW/Mechanize.pm#$mech->select($name,_$value) $mech->field: "Given the name of a field, set its value to the value specified." Seems the asker wrote the perfect program according to the (incorrect) instructions available. – Literat Mar 11 '11 at 07:06
  • @moshen Thanks so much, I actually realized that the first line the file I was reading from was in the wrong format, I needed the second line in the file. Thanks again for the pointers. if I want to download the link in the outpuage webpage http://bioinfo.uni-plovdiv.bg/microinspector/cgi/postscript/20110311051243/seq20110311051243.csv I was thinking about using the mech->find_link option. How do I direct the link to save? – chickenNuggets Mar 11 '11 at 20:47
  • Also, if I want to do this for multiple sequences what would be a good idea? I think I don't have a good understanding of what form_number means. I feel like there's only one form on this page, what would form_number(2) be referring to? If I want to extract sequences from the file and submit them, should I reload the page every time? Thanks so much again. – chickenNuggets Mar 11 '11 at 20:50
  • `form_number` just selects the form in which to fill using mechanize. This really only matters if there are multiple forms on the target page. – moshen Mar 11 '11 at 22:18
  • If you want to do multiple sequences, take something like my edited response and make it a subroutine that accepts the sequence you want to query. – moshen Mar 11 '11 at 22:19
  • I am having some trouble writing code to download this link on the output; 'Results in .CSV format (Right click and 'Save as')' Could you give me some pointers, I tried find_link and it didn't work. I am not sure how to get the link and save it. Thanks. – chickenNuggets Mar 11 '11 at 22:55
  • Please refer to my revised answer. – moshen Mar 11 '11 at 23:22
  • @moshen I tried to do your revised answer but I am program from a cluster at school and it seems as if I don't have permission to make the Web::Scraper. I get "Can't locate Web/Scraper.pm in @INC" when I run the perl script – chickenNuggets Mar 12 '11 at 15:32
  • The only file I need is the one that says "Results in .CSV format (Right click and 'Save as')", am I able to just do that with Mechanize? Thanks again for all the help. Can you explain what this means? `my $filename = ( $url =~ /\/([^\/]+)$/ )[0];` I am learning a lot, thanks so much again – chickenNuggets Mar 12 '11 at 15:33
  • I was actually just scraping out the postscript files. If you notice: `if( $url =~ /ps$/ )` is filtering the links by file extension (postscript). `my $filename = ( $url =~ /\/([^\/]+)$/ )[0];` basically strips the filename off of the found url in the `href` attribute. If you notice in the regular expression I am encapsulating `([^\/]+)`, which means: multiple characters that are NOT `/`. This encapsulation is placed into an array which is being referenced by `[0]`. It's shorthand for something like: `my $filename; if( $url =~ /\/([^\/]+)$/ ){ $filename = $1; }` – moshen Mar 12 '11 at 17:37
  • You can install, the perl modules in your home directory. I'll post a modified version later that doesn't use `Web::Scraper`. – moshen Mar 12 '11 at 17:48
  • @moshen, Thanks so much! I think it works! I have a question; if the sequence is really long and after submitting the webpage it takes a while for the host to do the calculation, does mechanize wait till the page loads? Because it seems that it moves on to the next iteration of subroutine without waiting for the webpage to finish processing. – chickenNuggets Mar 13 '11 at 02:27
  • hmmm I got an error. One of the sequence seems to take a long time to process due to its length. `There is no form numbered 1 at microinspector.pl line 39 Can't call method "value" on an undefined value at /usr/lib/perl5/site_perl/5.8.5/WWW/Mechanize.pm line 1030, line 36.` – chickenNuggets Mar 13 '11 at 03:27
  • I'm not sure what that error is referring to. You can try setting the UserAgent timeout to something really obscene. It defaults to 3 minutes. In this case you can change it with something like: `$browser->timeout(1800)` which would set the timeout for 30 minutes instead. Usually timeouts would have a different error though. – moshen Mar 13 '11 at 06:47
0

A little bit modified moshen's code:

use strict;
use warnings;

use WWW::Mechanize;

# create object for browser
my $browser = WWW::Mechanize->new();

my $sequence = <<END;
AAACACACTGGGGAATGGAGCAAGACAGTCTTTGAATATCAAACACGCAAGGCAATGAGACTACCCATCATAGATATCGCACCCTATGACATTGGGGGTCCTGATCAAGAATTTGGTGTGGACATTGGCCCTGTTTGCTTTTTATAAGCCAAACTCTCTGAAACCCCAGCAAAACAAAAACCACATCCATGTGTTCATCTTGTTTTAATCTTATCAACCAGTGCAAGTGACCAACTAAATTCCAGTTATTTATTTCCAAACTTTTGGAAAAAGCATAATTTGACAAAAAAAGAATACAATTTTTTGCTGTTTCAACCACCCAATACAGGTCAAATGCTTTTGTTTTATTTTTTTACCAATTCCAACTTCAAAATGTCTCAATGGTGCTATAATAAATAAACGTCAACACTTTTATGATAA
END

$browser->get("http://bioinfo.uni-plovdiv.bg/microinspector/");
$browser->submit_form(
    form_name => "forma",
    fields => {
        'target_sequence' => $sequence,
        'Choose an organism : ' => "Mus musculus",
    },
);

my @links = $browser->find_all_links( url_regex => qr/csv$/ );

foreach my $link ( @links ){

    my $result = $browser->get( $link->url() );
    my $filename = ( $link->url() =~ /\/([^\/]+)$/ )[0];

    print "Saving $filename\n";
    $browser->save_content($filename);
} 
gangabass
  • 10,607
  • 2
  • 23
  • 35