0

I have a list of URL's and would like to scrape the location objects for each of their webpages. The data I am referring to is produced by typing "window.location" into your browser's console. For example, performing this action on www.github.com with Chrome would give you the something like following output:

Location {assign: function, replace: function, reload: function, ancestorOrigins: DOMStringList, origin: "https://github.com"…}

When expanded, you can see more information:

Location {
    ancestorOrigins: DOMStringList 
    assign: function () { [native code] } 
    hash: "" 
    host: "github.com" 
    hostname: "github.com" 
    href: "https://github.com/" 
    origin: "https://github.com" 
    pathname: "/" 
    port: "" 
    protocol: "https:" 
    reload: function () { [native code] } 
    replace: function () { [native code] } 
    search: "" 
    toString: function toString() { [native code] } 
    valueOf: function valueOf() { [native code] } 
    __proto__: Location  
}

I have used Python and the Mechanize library to scrape in the past, but have never desired this functionality until now and am not sure how to proceed. Any suggestions would be welcomed.

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
190290000 Ruble Man
  • 2,173
  • 1
  • 20
  • 25

1 Answers1

1

As far as I understand, you want to perform a JavaScript call on desired web page. My suggestion would be to use some headless browsers. I did similar things with Framework called PyQt4. You can also use other headless web browsers like PhantomJS. Or you may also be interesting with tool called Selenium.

Vor
  • 33,215
  • 43
  • 135
  • 193