Questions tagged [mechanize]

Mechanize is a library for automated web browsing originally developed for Perl, there are now also Python and Ruby implementations.

Mechanize is a Ruby library for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history. It is adapted from the Perl module. There is also a for Python.

2512 questions
18
votes
2 answers

BeautifulSoup HTML table parsing

I am trying to parse information (html tables) from this site: http://www.511virginia.org/RoadConditions.aspx?j=All&r=1 Currently I am using BeautifulSoup and the code I have looks like this from mechanize import Browser from BeautifulSoup import…
16
votes
7 answers

WebBrowsing in C# - Libraries, Tools etc. - Anything like Mechanize in Perl?

Looking for something similar to Mechanize for .NET... If you don't know what Mechanize is.. http://search.cpan.org/dist/WWW-Mechanize/ I will maintain a list of suggestions here. Anything for browsing/posting/screen scraping (Other than WebRequest…
Jason
  • 11,435
  • 24
  • 77
  • 131
16
votes
2 answers

Detect redirect with ruby mechanize

I am using the mechanize/nokogiri gems to parse some random pages. I am having problems with 301/302 redirects. Here is a snippet of the code: agent = Mechanize.new page = agent.get('http://example.com/page1') The test server on mydomain.com will…
user337620
  • 2,239
  • 3
  • 19
  • 19
16
votes
11 answers

Emulating a browser to download a file?

There's an FLV file on the web that can be downloaded directly in Chrome. The file is a television program, published by CCTV (China Central Television). CCTV is a non-profit, state-owned broadcaster, financed by the Chinese tax payer, which allows…
showkey
  • 482
  • 42
  • 140
  • 295
16
votes
1 answer

mechanize how to get current url

I have this code require 'mechanize' @agent = Mechanize.new page = @agent.get('http://something.com/?page=1') next_page = page.link_with(:href=>/^?page=2/).click As you can see this code should go to the next page. The next_page should have url…
megas
  • 21,401
  • 12
  • 79
  • 130
15
votes
2 answers

Need more mechanize documentation (python)

I'm having a really hard time finding a good comprehensive source for Mechanize's documentation. Even the main documentation on mechanize's site isn't really that great: it only seems to list examples. Is there a more formal place for documentation…
varatis
  • 14,494
  • 23
  • 71
  • 114
15
votes
3 answers

Clicking a button with Ruby Mechanize

I have a particularly difficult form that I am trying to click the search button and can't seem to do it. Here is the code for the form from the page source:
Sean
  • 2,891
  • 3
  • 29
  • 39
15
votes
2 answers

Maintaining cookies between Mechanize requests

I'm trying to use the Ruby version of Mechanize to extract my employer's tickets from a ticket management system that we're moving away from that does not supply an API. Problem is, it seems Mechanize isn't keeping the cookies between the post call…
adamjford
  • 7,478
  • 6
  • 29
  • 41
15
votes
3 answers

Click on a javascript link within python?

I am navigating a site using python's mechanize module and having trouble clicking on a javascript link for next page. I did a bit of reading and people suggested I need python-spidermonkey and DOMforms. I managed to get them installed by I am not…
Lostsoul
  • 25,013
  • 48
  • 144
  • 239
15
votes
1 answer

Using Python and Mechanize to submit form data and authenticate

I want to submit login to the website Reddit.com, navigate to a particular area of the page, and submit a comment. I don't see what's wrong with this code, but it is not working in that no change is reflected on the Reddit site. import…
Parseltongue
  • 11,157
  • 30
  • 95
  • 160
15
votes
3 answers

how do i set a timeout value for python's mechanize?

How do i set a timeout value for python's mechanize?
Joe Schmoe
  • 1,815
  • 3
  • 15
  • 14
14
votes
2 answers

Web Crawler - Ignore Robots.txt file?

Some servers have a robots.txt file in order to stop web crawlers from crawling through their websites. Is there a way to make a web crawler ignore the robots.txt file? I am using Mechanize for python.
Craig Locke
  • 755
  • 4
  • 8
  • 12
14
votes
1 answer

How can I add a cookie to an existing cookielib CookieJar instance in Python?

I have a CookieJar that's being used with Mechanize that I want to add a cookie to. How can I go about doing this? make_cookie() and set_cookie() weren't clear enough for me. br = mechanize.Browser() cj =…
Paul
  • 519
  • 1
  • 4
  • 6
14
votes
8 answers

Programmatic Python Browser with JavaScript

I want to screen-scrape a web-site that uses JavaScript. There is mechanize, the programmatic web browser for Python. However, it (understandably) doesn't interpret javascript. Is there any programmatic browser for Python which does? If not, is…
Claudiu
  • 224,032
  • 165
  • 485
  • 680
14
votes
1 answer

Ruby Mechanize https error

I'm trying to do the following: page = Mechanize.new.get "https://sis-app.sph.harvard.edu:9030/prod/bwckschd.p_disp_dyn_sched" But I only get this exception: OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=SSLv2/v3 read server hello A:…
wrongusername
  • 18,564
  • 40
  • 130
  • 214