I am using Watir Webdriver with headless on a Linux system running Firefox and I am having some speed issues extracting links from webpages. The problem seems to be when multiple frames are being used. For example it cane take 10 minutes to return all the links on www.cnet.com.
Why is it taking this long and is there anything I can do to speed it up?
For example, these are some typical timings I took. It takes aprrox 8 seconds to get all links from the "default frame" but then 20 seconds to get those from a frame:
No Frame: 8.304341236
Frame: 20.050233141
Frame: 20.070569295
....
In fact in this case none of the frames actually contain any links. (See this issue I raised about skipping certain frames Watir-Webdriver Frame Attributes Not Congurent with Other Sources)
The code to extract the links from the page is as follows:
b.links.each do |uri|
# Check the HREF doesn't meet any of the following conditions. We don't want these so we ignore them.
if uri.href != nil and uri.href != "" and uri.href[0,7].downcase != "mailto:" and uri.href[0,11].downcase != "javascript:"
if debug
puts " [x] [" + Process.pid.to_s + "] Discovered (noframe) URL: " + uri.href
end
# Add the discovered HREF to the array
href.push(uri.href)
end
end
The code to used to extract links from the frames is as follows:
b.frames.each do |frame|
frame.links.each do |uri|
if uri.href != nil and uri.href != "" and uri.href[0,7].downcase != "mailto:" and uri.href[0,11].downcase != "javascript:"
if debug
puts " [x] [" + Process.pid.to_s + "] Discovered Frame URL: " + uri.href
end
# Add the discovered HREF to the array
href.push(uri.href)
end
end
end
Any help would be appreciated.