1

I'm trying to build a Google query string, make a request to that page, scrape the HTML, and parse it in a Chrome extension, which is JavaScript. So I have the following code:

var url = "https://www.google.com/search?#q=" + artist + "+" + title;
searchGoogleSampleInformation(url);

function searchGoogleSampleInformation(url)
{
    var xhr = new XMLHttpRequest();
    xhr.open("GET", url, false);
    xhr.onreadystatechange = function ()
    {
        if (xhr.readyState == 4)
        {
            return parseGoogleInformation(xhr.responseText, url);
        }
    }

    xhr.send();
}

function parseGoogleInformation(search_results, url)
{
    var link = $(".srg li.g:eq(0) .r a", search_results).attr('href');
}

The parse method just grabs the url of the first search result (which is not want I'll end up doing, but just to test that the HTTP Request was working). But link is undefined after that line. Then I used alert(url) and verified that my query string was being built correctly; I copied it from the alert window and pasted into another tab, and it pulled up the results as expected. Then I opened a new window with search_results, and it appeared to be Google's regular homepage with no search at all. I thought that problem might be occurring because of the asynchrony of the xhr.open call, but flipping that didn't help either. Am I missing something obvious?

aquemini
  • 950
  • 2
  • 13
  • 32

2 Answers2

2

It's because "https://www.google.com/search?#q=" + artist + "+" + title initially has no search results in the content. Google renders the page initially with no results and then dynamically loads the results via JavaScript. Since you are just fetching the HTML of the page and processing it the JavaScript in the HTML never gets executed.

abraham
  • 46,583
  • 10
  • 100
  • 152
  • yeah that'll be the issue. is there a way to handle that in a Chrome extension? i've asked a similar question before on here and gotten nothing, and read through this post what seems like 100 times http://stackoverflow.com/questions/6508393/web-scraping-in-a-google-chrome-extension-javascript-chrome-apis – aquemini Aug 07 '14 at 15:38
  • One option would be to open the URL in a tab and inject the script to parse the rendered results. – abraham Aug 07 '14 at 16:42
0

You are making a cross domain Ajax call, which is not allowed by default. You cannot make a cross domain call unless the server supports it and you pass the appropriate headers.

However, as you mentioned you are building a Chrome extension, it is possible by adding a few fields in the manifest file: https://developer.chrome.com/extensions/xhr#requesting-permission

  • no, i have google in the manifest.json, and i'm not getting the cross domain error in the console. – aquemini Aug 07 '14 at 04:44
  • did you inspect the search_results variable? also what is the format of the response? Is it JSON/HTML? It seems like $ is a function that is available on google.com homepage, but I believe the response you get via XHR will not be in DOM, isn't it? – Rajkumar Madhuram Aug 07 '14 at 05:25