Questions tagged [mechanize-ruby]

The Ruby library for automating interaction with websites.

The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, can follow links, and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history.

193 questions
1
vote
0 answers

How to login into website using Mechanize

I'm trying to login to this website and I keep getting this error: Mechanize::ResponseCodeError (404 => Net::HTTPNotFound for ... I followed the documentation and changed the user agent but still have this problem: require 'rubygems' require…
1
vote
2 answers

How to set a "base URL" for Webrat, Mechanize

I would like to specify a base URL so I don't have to always specify absolute URLs. How can I specify a base URL for Mechanize to use?
Andrew
  • 227,796
  • 193
  • 515
  • 708
1
vote
1 answer

How to parse an Invalid XML

I have a project I'm working on where I request an XML document from a server and parse it to import the data into my system. I'm using Ruby 2.4.3. My issues is that the XML comes in with element tags that have names starting with numbers. …
user1977840
1
vote
2 answers

How to avoid getting blocked by websites when using Ruby Mechanize for web crawling

I am successful scraping building data from a website (www.propertyshark.com) using a single address, but it looks like I get blocked once I use loop to scrape multiple addresses. Is there a way around this? FYI, the information I'm trying to access…
Josh
  • 17
  • 1
  • 10
1
vote
2 answers

Get all tags followings a certain with mechanize ? (ruby)

How can I get all elements following once, like :

foo

bla bla

  • bar1
  • bar2
  • bar3

baz

  • lot
Matrix
  • 3,458
  • 6
  • 40
  • 76
1
vote
1 answer

Mechanize returns `connect_nonblock': SSL_connect returned=1 errno=0 state=SSLv3

I am trying to scrape a Crunchbase page but i got this error: ryzal~/Desktop/Sites/scraper$ ruby scraper.rb /Users/Ryzal/.rbenv/versions/2.3.1/lib/ruby/2.3.0/net/http.rb:933:in `connect_nonblock': SSL_connect returned=1 errno=0 state=SSLv3 read…
Ryzal Yusoff
  • 957
  • 2
  • 22
  • 49
1
vote
1 answer

Using page.at with CSS selector in Mechanize

I am trying to scrape a webpage with Mechanize, with the following structure:
Category
user5535484
1
vote
1 answer

Mechanize in Module, Nameerror ' agent'

Looking for advice on how to fix this error and refactor this code to improve it. require 'mechanize' require 'pry' require 'pp' module Mymodule class WebBot agent = Mechanize.new { |agent| agent.user_agent_alias = 'Windows…
user2012677
  • 5,465
  • 6
  • 51
  • 113
1
vote
1 answer

Are Mechanize and its dependencies incompatible with multithreading in JRuby or am I doing something wrong on my end?

I'm trying to scrape a group of pages with Mechanize and JRuby. I'm using JRuby to have multithreading, since the program is a little slow on MRI. However, I've been running into some problems with what seems to be non-threadsafe data types in…
GDP2
  • 1,948
  • 2
  • 22
  • 38
1
vote
0 answers

Disable javascript validation using mechanize in rails

I'm scrapping a website using Mechanize gem, the website has got a form which uses some javascript code for some validation. How do I bypass that? On form submission, the website redirects to the same form page.
1
vote
1 answer

Only one image getting uploaded multiple times

I have been using mechanize gem to scrape data from craigslist, I have a piece of code that uploads multiple image to craigslist, all the file paths are correct, but only single image gets uploaded multiple times what's the reason. unless…
codemilan
  • 1,072
  • 3
  • 12
  • 32
1
vote
1 answer

Hooks to always be run efter request - also on error

I know that mechanize has post_connect_hooks that will be run after the page is retrieved. However if an exception happens e.g. if you request an unknown URL like "http://dsjkhbgdfb.comsfg" then it runs pre_connect_hooks but not post_connect_hooks.…
Niels Kristian
  • 8,661
  • 11
  • 59
  • 117
1
vote
1 answer

Using the Ruby Mechanize "links_with" to grab text but getting extra content

When I grab a group of links using the Mechanize links_with method I only want the text showing the link but I'm getting a series of extra characters: links = @some_page.links_with(text: /V\s.*(BENCH|EARCX)|(BENCH|EARCX).*V/) links.each do…
bkunzi01
  • 4,504
  • 1
  • 18
  • 25
1
vote
0 answers

sending a form with mechanzie(Ruby) returns and empty page?

I want to scrape the list of offers of a given product from amazon.com with the quantity in stoke for each offer. To find this last information (quantity) I need to add that offer to cart, than edit the cart with the quantity 999. and than get the…
Nafaa Boutefer
  • 2,169
  • 19
  • 26
1
vote
2 answers

How do I ignore the nil values in the loop with parsed values from Mechanize?

In my text file are a list of URLs. Using Mechanize I'm using that list to parse out the title and meta description. However, some of those URL pages don't have a meta description which stops my script with a nil error: undefined method `[]' for…