0

First, sorry for my english, it's not my mother tongue. Anyway, some grammar errors will not kill you :) Hopefully.

I'm not able to get some information from a web page due to authentication system.

The website is : www.matchendirect.fr It's a French site and there is no way to turn it into english (sorry for the inconvenience) This website displays football game information.

My purpose is to get forecast data (displayed in the middle of the page, there is a table with forecast displayed called "Pronostics des internautes" but the content of this table is displayed only if you're logged in)

Here is my code :

import urllib2, cookielib
cookieJar = cookielib.CookieJar()
auth_url="http://www.matchendirect.fr/cgi/ajax/authentification.php?f_contexte=auth_form_action&f_email=pkwpa&f_mot_de_passe=pkw_pa"
url="http://www.matchendirect.fr/live-score/colombie-bresil.html"
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
request = urllib2.Request(auth_url)
response = opener.open(request)
response = opener.open(url)
webpage=response.read()

To be sure to be log in, we can try this:

if webpage.find("prono_stat_data")!=-1:
    print("I'm logged in")

I think my cookies managment isn't good...

Here are my credentials, play with them, it's obviously a fake account create only for this topic.

username : pkwpa password : pkw_pa

Hope someone could help me.

nino11
  • 13
  • 1
  • 3
  • Could you maybe upload a screenshot of what exactly you want to be scraped. Is it the column called internautes? Or the whole table? Or the content that is displayed when you hover over the cells in the table? – Sebastian Sep 06 '14 at 16:25
  • Thanks for answering me Sebastian. What I'm looking for is the content that is displayed when you hover over the cells in the table. I can't upload picture. Sorry. – nino11 Sep 06 '14 at 16:53
  • I tried something like that but it failed : import urllib2 opener=urllib2.build_opener() opener.addheaders.append(('Cookie','PHPSESSID=tqj16pd7oiv20bcetg6cktq3a1')) opener.addheaders.append(('Cookie','c_compte_pseudo=pkwpa')) opener.addheaders.append(('Cookie','c_compte_id=159819')) opener.addheaders.append(('Cookie','c_compte_cle=dfe9de4de057f8113c4008d183f29826')) f=opener.open("http://www.matchendirect.fr/live-score/espagne-republique-de-macedoine.html") f=f.read() f.find("prono_stat_data") >>> -1 – nino11 Sep 08 '14 at 18:22

2 Answers2

0

here is what you're looking for : http://docs.python-requests.org/en/latest/user/install/#install Use it like below: from requests import session

with session() as c:
    c.get('http://www.matchendirect.fr/cgi/ajax/authentification.php?f_contexte=auth_form_action&f_email=pkwpa&f_mot_de_passe=pkw_pa')
    request = c.get('http://www.matchendirect.fr/live-score/colombie-bresil.html')
    print request.headers
    print request.text

Cheers

  • Hello, I tried your solution and it seems doesn't work. This test failed : `if request.find("prono_stat_data")!=-1: print("I'm logged in")` – nino11 Sep 24 '14 at 09:30
0

Try adding header to opener. I once had an issue resolved using the header

import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open('http://www.example.com/')

adding to the code

import urllib2, cookielib
cookieJar = cookielib.CookieJar()
auth_url="http://www.matchendirect.fr/cgi/ajax/authentification.php?   f_contexte=auth_form_action&f_email=pkwpa&f_mot_de_passe=pkw_pa"
url="http://www.matchendirect.fr/live-score/colombie-bresil.html"
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.addheaders.append(('Cookie', 'cookiename=cookievalue'))
request = urllib2.Request(auth_url)
response = opener.open(request)
response = opener.open(url)
webpage=response.read()
nu11p01n73R
  • 26,397
  • 3
  • 39
  • 52
  • Hello, I tried your solution and it seems doesn't work too. This test failed : if webpage.find("prono_stat_data")!=-1: print("I'm logged in"). It seems adding headers isn't enough! – nino11 Sep 30 '14 at 06:02