How to scrape location objects?

Question

I have a list of URL's and would like to scrape the location objects for each of their webpages. The data I am referring to is produced by typing "window.location" into your browser's console. For example, performing this action on www.github.com with Chrome would give you the something like following output:

Location {assign: function, replace: function, reload: function, ancestorOrigins: DOMStringList, origin: "https://github.com"…}

When expanded, you can see more information:

Location {
    ancestorOrigins: DOMStringList 
    assign: function () { [native code] } 
    hash: "" 
    host: "github.com" 
    hostname: "github.com" 
    href: "https://github.com/" 
    origin: "https://github.com" 
    pathname: "/" 
    port: "" 
    protocol: "https:" 
    reload: function () { [native code] } 
    replace: function () { [native code] } 
    search: "" 
    toString: function toString() { [native code] } 
    valueOf: function valueOf() { [native code] } 
    __proto__: Location  
}

I have used Python and the Mechanize library to scrape in the past, but have never desired this functionality until now and am not sure how to proceed. Any suggestions would be welcomed.

score 1 · Accepted Answer · answered Jun 25 '13 at 00:33

1

As far as I understand, you want to perform a JavaScript call on desired web page. My suggestion would be to use some headless browsers. I did similar things with Framework called PyQt4. You can also use other headless web browsers like PhantomJS. Or you may also be interesting with tool called Selenium.

answered Jun 25 '13 at 00:33

Vor

33,215
43
135
193

Turns out this is very doable using the PhantomJS library. I've been playing around for a few minutes and got the functionality that I need. – 190290000 Ruble Man Jun 25 '13 at 16:22

How to scrape location objects?

1 Answers1