WWW::Mechanize::Firefox looping though links

Question

I am using a foreach to loop through links. Do I need a $mech->back(); to continue the loop or is that implicit.

Furthermore do I need a separate $mech2 object for nested for each loops?

The code I currently have gets stuck (it does not complete) and ends on the first page where td#tabcolor3 is not found.

foreach my $sector ($mech->selector('a.link2'))
{
    $mech->follow_link($sector);

    foreach my $place ($mech->selector('td#tabcolor3'))
    {
            if (($mech->selector('td#tabcolor3', all=>1)) >= 1)
    {
        $mech->follow_link($place);
            print $_->{innerHTML}, '\n'
            for $mech->selector('td.dataCell');
        $mech->back();
    }
    else
    {
        $mech->back();
    }
}

score 1 · Answer 1 · answered Mar 11 '13 at 10:35

1

I recommend to use separate $mech object for this:

foreach my $sector ($mech->selector('a.link2'))
{
    my $mech = $mech->clone();
    $mech->follow_link($sector);

    foreach my $place ($mech->selector('td#tabcolor3'))
    {
            if (($mech->selector('td#tabcolor3', all=>1)) >= 1)
    {
            my $mech = $mech->clone();
            $mech->follow_link($place);
            print $_->{innerHTML}, '\n'
            for $mech->selector('td.dataCell');
        #$mech->back();
    }
#    else
#    {
#        $mech->back();
#    }
}

answered Mar 11 '13 at 10:35

gangabass

10,607
2
23
35

Why do you recommend multiple Mechanize objects? – Borodin Mar 11 '13 at 11:20
Because I can easy change this code to use with several threads for example. I'm talking about classic WWW::Mechanize of course not a Firefox. – gangabass Mar 11 '13 at 11:57
The `clone` method is listed in the module documentation under *Functions that will likely never be implemented*. Presumably you haven;t tested your code? – Borodin Mar 11 '13 at 12:14

Borodin · Accepted Answer · 2013-03-11T13:51:07.720

1

You cannot access information from a page when it is no longer on display. However, the way foreach works is to build the list first before it is iterated through, so the code you have written should be fine.

There is no need for the call to back as the links are absolute. If you had used click then there must be a link in the page to click on, but with follow_link all you are doing is going to a new URL.

There is also no need to check the number of links to follow, as a for loop over an empty list will simply not be executed.

To make things clearer I suggest that you assign the results of selector to an array before the loop.

Like this

my @sectors = $mech->selector('a.link2');
for my $sector (@sectors) {

    $mech->follow_link($sector);

    my @places = $mech->selector('td#tabcolor3');
    for my $place (@places) {

        $mech->follow_link($place);

        print $_->{innerHTML}, '\n' for $mech->selector('td.dataCell');
    }
}

Update

My apologies. It seems that follow_link is finicky and needs to follow a link on the current page.

I suggest that you extract the href attribute from each link and use get instead of follow_link.

my @selectors = map $_->{href}, $mech->selector('a.link2');
for my $selector (@selectors) {

    $mech->get($selector);

    my @places = map $_->{href}, $mech->selector('td#tabcolor3');
    for my $place (@places) {

        $mech->get($place);

        print $_->{innerHTML}, '\n' for $mech->selector('td.dataCell');
    }
}

Please let me know whether this works on the site you are connecting to.

edited Mar 11 '13 at 13:51

answered Mar 11 '13 at 10:50

Borodin

126,100
9
70
144

Thanks much more elegant solution. I am getting a Mozrepl::RemoteObject: TypeError - can't access dead object at this line := $mech->follow_link($share); #it is shown above...I think there is a problem with the nested for...do I need a seperate mech object as previous answer suggested? – tread Mar 11 '13 at 11:43
Sorry the line is: $mech->follow_link($place); – tread Mar 11 '13 at 12:07
From the [*latest modifications list*](http://cpansearch.perl.org/src/CORION/WWW-Mechanize-Firefox-0.70/Changes) it looks like the "dead object" problem started with Firefox 15. I have updated my solution to show an alternative approach. – Borodin Mar 11 '13 at 12:51
Thanks for the insight, unfrtunately this also gives a dead object error in the ~/MozRepl/RemoteObject.pm line 1530. It does only the first iteration...the error comes then it print the output – tread Mar 11 '13 at 13:31
Maybe it is too late even to extract the value of `href`. I have edited my solution again slightly to fetch the `href` link while the page is displayed. Please see if it works now. – Borodin Mar 11 '13 at 13:52
Are you sure you want to follow the link of a `` element? I think an `href` attribute on a `td` is illegal. – Borodin Mar 11 '13 at 13:54
Incredible!!! Thanks so much it's working. I'm stoked. Yes I changed the 'td' to 'a[name=tranlist]' as you suggested in my other question. So as a rule of thumb mechanize::Firefox only uses what is on the current page – tread Mar 11 '13 at 14:39
Objects that are an instance of `MozRepl::RemoteObject::Instance are only valid until the browser loads another page. (See what class things are by `print`ing them). The things in the list that `selector` returns are like this, but you can interrogate them for information that will not change, like the `href` attribute here which is just a string. – Borodin Mar 11 '13 at 15:27

d586 · Answer 3 · 2014-04-06T09:36:59.567

I am using WWW:Mechanize::Firefox to loop over a bunch of URLs with loads of Javascript. The page does not render immediately so need test if a particular page element is visible (similar to suggestion in Mechanize::Firefox documentation except 2 xpaths in the test) before deciding next action.

The page eventually renders a xpath to 'no info' or some wanted stuff after about 2-3 seconds. If no info we go to next URL. I think there is some sort of race condition with both xpaths not existing at once causing the MozRepl::RemoteObject: TypeError: can't access dead object error intermittently (at the sleep 1 in the loop oddly enough).

My solution that seems to work/improve reliability is to enclose all the $mech->getand$mech->is_visible in an eval{}; like this:

eval{ 
  $mech->get("$url");
  $retries = 15; #test to see if element visible = page complete
  while ($retries-- and ! $mech->is_visible( xpath => $xpath_btn ) and  ! $mech->is_visible( xpath => $xpath_no_info )){
    sleep 1;
  };
  last if($mech->is_visible( xpath => $xpath_no_info) ); #skip rest if no info page
};

Others might suggest improvements on this.

WWW::Mechanize::Firefox looping though links

3 Answers3