0

I am trying to return email addresses from a website using mechanize. I am very easily able to determine whether or not the "@" symbol is found on the page by using the code below.

However, I would like to return the characters surrounding the @ symbol to determine whether or not it might be an email address. Anyone know how I might be able to return the surrounding characters once the @ is found?

I know mechanize can return links, but the email address might not be a link. Thanks!

require 'mechanize'

mechanize = Mechanize.new { |agent|
  agent.open_timeout   = 4
  agent.read_timeout   = 4
  agent.max_history = 0
  agent.follow_meta_refresh = true
  agent.keep_alive = false
}

website = ARGV[0]
keyword = "@"
page = mechanize.get(website)

if page.body.include?(keyword)
  puts "found \"#{keyword}\" on #{website}"
else
  puts "not found"
end
Brandon
  • 1,701
  • 3
  • 16
  • 26

1 Answers1

0

Building off what pguardario said, since you're looking to match a pattern in a body of text this isn't really a mechanize related question as you can already scrape the page for the info you need.

Instead, it's regex based:

Something like

# Naive e-mail match regex, plenty out there to google though this might be enough
emails = /(\w+@+[\w\.]+)/.match page.body.to_s

emails.each do |email|
  puts email.to_s
end

Regex: http://rubular.com/r/PHNhUfyGaC

Jeff LaJoie
  • 1,725
  • 2
  • 17
  • 27