24

I click on a link in Firefox, the webpage sends a request using javascript, then the server sends some sort of response which includes a website address. So this new website then opens in a new Window. The html code behind the link is (I've omitted initial and final <span> tag):

> class="taLnk hvrIE6"
> onclick="ta.trackEventOnPage('AttractionContactInfo', 'Website',
> 2316062, 1); ta.util.cookie.setPIDCookie(15190);
> ta.call('ta.util.link.targetBlank', event, this,
> {'aHref':'LqMWJQiMnYQQoqnQQxGEcQQoqnQQWJQzZYUWJQpEcYGII26XombQQoqnQQQQoqnqgoqnQQQQoqnQQQQoqnQQQQoqnqgoqnQQQQoqnQQuuuQQoqnQQQQoqnxioqnQQQQoqnQQJMsVCIpEVMSsVEtHJcSQQoqnQQQQoqnxioqnQQQQoqnQQniaQQoqnQQQQoqnqgoqnQQQQoqnQQWJQzhYmkXHJUokUHnmKTnJXB',
> 'isAsdf':true})">Website

I want to capture the server response and extract the 'new website' using Python and Selenium. I've been using BeautifulSoup for scraping and am pretty new to Selenium.

So far, I am able to find this element and click on it using selenium, which opens the 'new website' in a new window. I don't know how to capture the response from server.

nalzok
  • 14,965
  • 21
  • 72
  • 139
Faisal
  • 447
  • 2
  • 6
  • 15
  • I think the title of the question is misleading - beautifulsoup has nothing to do with your question. "obtaining AJAX response using Selenium" or something is relevant. – SiddharthaRT Oct 21 '14 at 08:19
  • A senior member suggested me to rename my question this way... my actual title was indeed related to python and selenium – Faisal Oct 21 '14 at 09:28

3 Answers3

21

I once intercepted some ajax calls injecting javascript to the page using selenium. The bad side of the history is that selenium could sometimes be, let's say "fragile". So for no reason I got selenium exceptions while doing this injection.

Anyway, my idea was intercept the XHR calls, and set its response to a new dom element created by me that I could manipulate from selenium. In the condition for the interception you can even use the url that made the request in order to just intercept the one that you actually want (self._url)

btw, I got the idea from intercept all ajax calls?

Maybe this helps.

browser.execute_script("""
(function(XHR) {
  "use strict";

  var element = document.createElement('div');
  element.id = "interceptedResponse";
  element.appendChild(document.createTextNode(""));
  document.body.appendChild(element);

  var open = XHR.prototype.open;
  var send = XHR.prototype.send;

  XHR.prototype.open = function(method, url, async, user, pass) {
    this._url = url; // want to track the url requested
    open.call(this, method, url, async, user, pass);
  };

  XHR.prototype.send = function(data) {
    var self = this;
    var oldOnReadyStateChange;
    var url = this._url;

    function onReadyStateChange() {
      if(self.status === 200 && self.readyState == 4 /* complete */) {
        document.getElementById("interceptedResponse").innerHTML +=
          '{"data":' + self.responseText + '}*****';
      }
      if(oldOnReadyStateChange) {
        oldOnReadyStateChange();
      }
    }

    if(this.addEventListener) {
      this.addEventListener("readystatechange", onReadyStateChange,
        false);
    } else {
      oldOnReadyStateChange = this.onreadystatechange;
      this.onreadystatechange = onReadyStateChange;
    }
    send.call(this, data);
  }
})(XMLHttpRequest);
""")
ballade4op52
  • 2,142
  • 5
  • 27
  • 42
supita
  • 845
  • 6
  • 11
  • Thanks for sharing your experience. I have no workable knowledge of javascript. I just needed to collect some data that the website sent as response of an ajax call. The solution seems to identify and simulate the call using Python's own modules such as requests or urllib. This helps me collect data without any javascript stuff – Faisal Nov 15 '14 at 14:52
  • If you know the url of the website in advance you don't need to do deal with javascript, but in my case it wasn't easy to know in advance some of the parameters of the url, so I needed to deal with js. If the solution of your problem is what you posted before, please mark it as the answer of your question. – supita Nov 17 '14 at 14:28
  • Almost two years later, is this still the way to go? I need to retrieve the urls of the ajax calls my site makes, as i do not know some of the parameters in advance. Also, what about timing? How can I be sure, that this script is executed before any ajax request happen? Thx. – Hinrich Aug 16 '16 at 20:58
11

I've come up to this page when trying to catch XHR content based on AJAX requests. And I eventually found this package

from seleniumwire import webdriver  # Import from seleniumwire
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# Go to the Google home page
driver.get('https://www.google.com')

# Access requests via the `requests` attribute
for request in driver.requests:
    if request.response:
        print(
            request.url,
            request.response.status_code,
            request.response.headers['Content-Type']
        )

this package allow to get the content response from any request, such as json :

https://www.google.com/ 200 text/html; charset=UTF-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_120x44dp.png 200 image/png
https://consent.google.com/status?continue=https://www.google.com&pc=s&timestamp=1531511954&gl=GB 204 text/html; charset=utf-8
https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png 200 image/png
https://ssl.gstatic.com/gb/images/i2_2ec824b0.png 200 image/png
https://www.google.com/gen_204?s=webaft&t=aft&atyp=csi&ei=kgRJW7DBONKTlwTK77wQ&rt=wsrt.366,aft.58,prt.58 204 text/html; charset=UTF-8
..
Loïc
  • 146
  • 1
  • 4
2

I was unable to capture AJAX response with selenium but here is what works, although without selenium:

1- Find out the XML request by monitoring the network analyzing tools in your browser

2= Once you've identified the request, regenerate it using Python's requests or urllib2 modules. I personally recommend requests because of its additional features, most important to me was requests.Session.

You can find plenty of help and relevant posts regarding these two steps.

Hope it will help someone someday.

Faisal
  • 447
  • 2
  • 6
  • 15
  • Did exactly this for a site I'm scraping. Took a while to figure out what the actual call was from Chrome's network tools but I found it. Then I tested the response in my browser and finally with requests. Worked like a charm. In my case the output appears to be a mix of JSON and other data - all easily parsed. Thanks again. – Gabe Spradlin Jan 29 '16 at 05:13