1

I am trying to use mechanize to scrape a website that requires me to log in. Here is the start of me code.

#!/usr/bin/python

#scrape the admissions part of SAFE

import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text

# Browser
br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

br.addheaders = [('User-agent', 'Chrome')]

# The site we will navigate into, handling it's session
br.open('https://url')

# View available forms
for f in br.forms():
    print f

This gives me

<POST https://userstuff application/x-www-form-urlencoded
  <HiddenControl(lt=LT-227363-Ja4QpRvdxrbQF0nb7XcR2jQDydH43s) (readonly)>
  <HiddenControl(execution=e1s1) (readonly)>
  <HiddenControl(_eventId=submit) (readonly)>
  <TextControl(username=)>
  <PasswordControl(password=)>
  <SubmitButtonControl(submit=) (readonly)>
  <CheckboxControl(warn=[on])>>

How can I now enter the username and password?

I tried

# Select the first (index zero) form 
br.select_form(nr=0)

# User credentials
br.form['username'] = 'username'
br.form['password'] = 'password'

# Login
br.submit()

But that doesn't seem to work.

Simd
  • 19,447
  • 42
  • 136
  • 271

2 Answers2

4

In the end this worked for me

#!/usr/bin/python

#scraper

import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text

# Browser
br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

br.addheaders = [('User-agent', 'Chrome')]

# The site we will navigate into, handling it's session
br.open('url1')

# View available forms
for f in br.forms():
    if f.attrs['id'] == 'fm1':
        br.form = f
        break

# User credentials
br.form['username'] = 'password'
br.form['password'] = 'username'

# Login
br.submit()

#Now we need to confirm again

br.open('https://url2')

# Select the first (index zero) form 
br.select_form(nr=0)

# Login
br.submit()

print(br.open('https:url2').read())
Simd
  • 19,447
  • 42
  • 136
  • 271
0

I'd look at the html form rather than what mechanize gives you. Below is an example of a form I've tried to fill out in the past.

<input type="text" name="user_key" value="">
<input type="password" name="user_password">

Below is the code I use to log into that website using the form above

# Browser
br = mechanize.Browser()

# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)

# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_refresh(False) 
br.set_handle_referer(True)
br.set_handle_robots(False)

# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

# User-Agent
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

# The site we will navigate into, handling it's session
br.open('https://www.website.com/login')

#select the first form
br.select_form(nr=0)

#user credentials
br['user_key'] = 'myusername@gmail.com'
br['user_password'] = 'mypassword'

# Login
br.submit()

link = 'http://www.website.com/url_i_want_to_scrape'

br.open(link)
response = br.response().read()
print response

Your issue could be that you're either choosing the wrong form giving the incorrect field names

shartshooter
  • 1,761
  • 5
  • 19
  • 40
  • I'm showing an example of a time when I faced the same issue and a way to resolve it. Of course the websites aren't the same but the technique can be used in his/her case as well – shartshooter Mar 10 '16 at 20:04
  • I get AttributeError: type object 'CookieJar' has no attribute 'LWPCookieJar' error when I am using this. Although I imported cookielib(now called cookiejar). Any ideas? – limitcracker Jul 27 '19 at 08:34