1

I am having trouble creating and keeping new sessions when I am scraping my page. I am initiating a session within my script using the Requests library and then parsing values to a web form. However, it's is returning a "Your session has timed out" page.

Here is my source:

import requests

session = requests.Session()

params = {'Rctl00$ContentPlaceHolder1$txtName': 'Andrew'}
r = session.post("https://www.searchiqs.com/NYALB/SearchResultsMP.aspx", data=params)
print(r.text)

The url I want to search from is this https://www.searchiqs.com/NYALB/SearchAdvancedMP.aspx

I am searching for a Party 1 name called "Andrew". I have identified the form element holding this search box as 'Rctl00$ContentPlaceHolder1$txtName'. The action url is SearchResultsMP.aspx.

When i do it from a browser, it gives the first page of results. When i do it in the terminal it gives me the session expired page. Any ideas?

Tendekai Muchenje
  • 440
  • 1
  • 6
  • 20

1 Answers1

0

First, I would refer you to the advanced documentation related to use of sessions within the requests Python module.

http://docs.python-requests.org/en/master/user/advanced/

I also notice that navigating to the base URL in your invocation of sessions.post redirects to:

https://www.searchiqs.com/NYALB/InvalidLogin.aspx?InvLogInCode=OldSession%2007/24/2016%2004:19:37%20AM

I "hacked" the URL to navigate to:

https://www.searchiqs.com/NYALB/

...and notice that if I click on the Show Login Fields link on that page, I am prompted a form appears with prompts for User ID and Password. Your attempts to programmatically do your searches are likely failing because you have not done any sorts of authentication. It likely works in your browser because you have been permitted to access this, either by some previous authentication you have completed and may have forgotten about, or some sort of server side access rules that don't ask for this based upon some criteria.

Running those commands in a local interpreter, I can see that the site owner did not bother to return a status code indicative of failed auth. If you check, the r.status_code is 200 but your r.text will be the Invalid Login page. I know nada about ASP, but am guessing that HTTP status codes should be indicative of what actually happened.

Here is some code, that does not really work, but may illustrate how you may want to interact with the site and sessions.

import requests

# Create dicts with our login and search data
login_params = {'btnGuestLogin': 'Log+In+as+GUEST'}
search_params = {'ctl00$ContentPlaceHolder1$txtName': 'Andrew'}
full_params = {'btnGuestLogin': 'Log+In+as+GUEST', 'ctl00$ContentPlaceHolder1$txtName': 'Andrew'}


# Create session and add login params
albany_session = requests.session()
albany_session.params = login_params

# Login and confirm login via searching for the 'ASP.NET_SessionId' cookie.
# Use the login page, not the search page first.
albany_session.post('https://www.searchiqs.com/NYALB/LogIn.aspx')
print(albany_session.cookies)

# Prepare a your search request
search_req = requests.Request('POST', 'https://www.searchiqs.com/NYALB/SearchAdvancedMP.aspx',data=search_params)
prepped_search_req = albany_session.prepare_request(search_req)

# Probably should work but does not seem to, for "reasons" unknown to me.
search_response = albany_session.send(prepped_search_req)
print(search_response.text)

An alternative may be for you to consider is Selenium browser automation with Python bindings.

http://selenium-python.readthedocs.io/

terryjbates
  • 312
  • 1
  • 8
  • thanks so much for this help. I am working through your script and I'm wondering, where are you defining `full_params` that you are using for for your `search_req` ? – Tendekai Muchenje Jul 24 '16 at 21:26
  • That was a mistake I made in copying and pasting. I was experimenting with sending only auth info and search info, as well as sending everything in `post` requests. It did not work so well. Suffice it to say, you could insert either `full_params` or `search_params` into that line for `search_req` and neither will work. I will edit the proposed solution accordingly and only mention `search_params`. My mistake. – terryjbates Jul 24 '16 at 21:44