3

I'm using Python 3 to write a script to log in to Amazon to grab my Kindle highlights. It is based on this article: https://blog.jverkamp.com/2015/07/02/scraping-kindle-highlights/

I am unable to successfully log in and instead get a message saying to enable cookies to continue:

<RequestsCookieJar[<Cookie ubid-main=189-4768762-8531647 for .amazon.com/>]>
Failed to login: 

Please Enable Cookies to Continue

To continue shopping at Amazon.com, please enable cookies in your Web browser.
Learn more about cookies and how to enable them.

I have included requests sessions to handle cookies, but it doesn't seem to be working.

Here is the code I am using to try to do this:

import bs4, requests

session = requests.Session()
session.headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36'
}

# Log in to Amazon, we have to get the real login page to bypass CSRF
print('Logging in...')
response = session.get('https://kindle.amazon.com/login')

soup = bs4.BeautifulSoup(response.text, "html.parser")

signin_data = {}
signin_form = soup.find('form', {'name': 'signIn'})
for field in signin_form.find_all('input'):
    try:
        signin_data[field['name']] = field['value']
    except:
        pass

signin_data[u'ap_email'] = 'myemail'
signin_data[u'ap_password'] = 'mypassword'


response = session.post('https://www.amazon.com/ap/signin', data = signin_data)

soup = bs4.BeautifulSoup(response.text, "html.parser")

warning = soup.find('div', {'id': 'message_warning'})
if warning:
    print('Failed to login: {0}'.format(warning.text))

Is there something I'm missing with my use of sessions?

nyedidikeke
  • 6,899
  • 7
  • 44
  • 59
tjm
  • 33
  • 1
  • 3

2 Answers2

5

2020 - this code will no longer work. Amazon has added JavaScript to its sign in pages which if not executed, make this sequence fail. Retrieved pages will state cookies are not enabled even though they are and work. Sending both username and password together results in a verification page response which included a captcha. Sending username then sending password in a 2nd exchange results in the reply “something went wrong” and will ask for username/password again. Amazon recognizes the JavaScript was not executed.

Jay Mosk
  • 197
  • 1
  • 2
  • 8
0

Your signin form data is actually not correct it should be email and password:

signin_data[u'email'] = 'your_email'
signin_data[u'password'] = 'your_password'

You can also avoid the try with a css select and has_attr:

import bs4, requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36'
}

from bs4 import BeautifulSoup

with requests.Session() as s:
    s.headers = headers
    r = s.get('https://kindle.amazon.com/login')
    soup = BeautifulSoup(r.content, "html.parser")
    signin_data = {s["name"]: s["value"]
                   for s in soup.select("form[name=signIn]")[0].select("input[name]")
                   if s.has_attr("value")}

    signin_data[u'email'] = 'your_em'
    signin_data[u'password'] = 'pass'

    response = s.post('https://www.amazon.com/ap/signin', data=signin_data)
    soup = bs4.BeautifulSoup(response.text, "html.parser")
    warning = soup.find('div', {'id': 'message_warning'})
    if warning:
        print('Failed to login: {0}'.format(warning.text))
    print(response.content)

The first line of the output, you can see <title>Amazon Kindle: Home</title> at the end:

b'<?xml version="1.0" encoding="utf-8"?>\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">\n  <head>\n    <title>Amazon Kindle: Home</title>\n  

If it is not working still, you should update your version of requests and maybe try another user-agent. Once I changed the ap_email and ap_password I logged in fine.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321