4

SOLVED?, almost..

This is related to how Chrome (47.0.2526.73) handles xml files. I don't know the details, but this code works perfectly fine in Firefox (43.0.4).

I'm still curious as to why this is, or how to get it to work in Chrome.

What I'm trying to do:

Create a javascript bookmarklet to check sitemap xml links for 404s/500s/etc.

Code snippet in question:

    var siteMap="http://www.example.com/sitemap.xml";
    var httpPoke = function(url,callback){
        var x;
        x = new XMLHttpRequest();
        x.open('HEAD', url);
        x.onreadystatechange = function() {
            if (this.readyState == this.DONE) {
                callback(this.status);
            }
        }
        x.send();
    };  

    var response=httpPoke(siteMap,function(n){
                console.log(n);
                });

If I am on any other page in the domain, response is:

    200

If I navigate to the actual sitemap, http://www.example.com/sitemap.xml, the same code responds with:

No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'null' is therefore not allowed access.

Since my goal is to provide a bookmarklet that can be invoked on the sitemap itself, this puts a kink in my plan.

How to test this:

1) Find any xml from some website. Google "filetype:xml sitemap" and look for a response that is just an xml file (you'll find some will redirect you).

2) Put the code above in a bookmarklet, or directly in the developer's console of your browser.

3) Make sure variable siteMap is set to the current URL. This is to be compliant with CORS. You could even do siteMap=location.href;

What you'll find is that it works fine in Firefox, but not in Chrome.

Note:

Executing code FROM an HTML page, targeting an HTML page does work.
Executing code FROM an HTML page, targeting an XML page does work.
Executing code FROM an XML page, targeting an HTML page does not work.
Executing code FROM an XML page, targeting an XML page does not work.

Research I've done:

Everything I can find on this error is (understandably) related to:

  1. Cross domain requests
  2. Having either the source or target on localhost, file:///, or otherwise on your local machine.

My scenario is neither of these.

JimParris
  • 61
  • 6
  • do you control the sitemap.xml file/server? – Daniel A. White Jan 27 '16 at 13:59
  • 2
    I feel it has something to do with it being an xml page – Jaromanda X Jan 27 '16 at 14:06
  • Similar questions: [XmlHttpRequest in a bookmarklet returns empty responseText on GET?](http://stackoverflow.com/questions/2715593/xmlhttprequest-in-a-bookmarklet-returns-empty-responsetext-on-get?lq=1) and [Ajax call from Bookmarklet](http://stackoverflow.com/questions/664689/ajax-call-from-bookmarklet) – Yogi Jan 27 '16 at 14:43
  • Jaromanda, you are right! I just tried with someone else's xml sitemap (just googled "filetype:xml sitemap" and picked one) and the exact same thing happened. This is odd indeed. According to location.origin, the origin is not null. I'm executing either as a bookmarklet or in console and the error still remains for xml files. – JimParris Jan 27 '16 at 14:43
  • So, what happens if you try exactly the same code but with an HTML file? That is, `http://www.example.com/sitemap.html` instead of `http://www.example.com/sitemap.xml`? – sideshowbarker Jan 28 '16 at 01:23
  • The exact same code works fine if I go to any other (HTML) page in the same domain. The code also does **not** work if the target of the httpPoke is HTML but the current page is an xml document. I'll clarify on the question as well. – JimParris Jan 28 '16 at 10:26

2 Answers2

2

OK.

Before, I said:

So when you view an xml file with Firefox or Chrome (or IE, presumably), what you are viewing is actually a document created by the browser's inbuilt xml parser.

In the case of Chrome, that is served from (nodomain), and is identified like so:

/* Copyright 2014 The Chromium Authors. All rights reserved.
 * Use of this source code is governed by a BSD-style license that can be
 * found in the LICENSE file.
 */

So even though the URL says "http://www.example.com/sitemap.xml", and in the console window.location.href is "http://www.example.com/sitemap.xml", and location.origin is ""http://www.example.com", in actuality, origin is (nodomain) as if it was an extension page. Because it is.

So origin is actually always null for xml pages.

This is not necessarily the case.

I found this:
Chrome adding Origin header to same-origin request

Testing on Firefox confirms that FF does not set Origin on same-origin GET or HEAD requests, but Chrome does. This is not normally a problem, but on XML pages document.domain is set to null. Therefore the origin it sets is null.

Possibly a bug in Chrome? Or intentional?

I'm still not satisfied with my own answer...

Test this:

  • Go to any xml page in Chrome.
  • In the console, make any AJAX request.
  • Check the request headers in the network tab:

    Accept:*/* Accept-Encoding:gzip, deflate, sdch Accept-Language:en-GB,en-US;q=0.8,en;q=0.6 Cache-Control:no-cache Connection:keep-alive Host:www.example.com Origin:null Pragma:no-cache User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.73 Safari/537.36

Go upvote jaromanda-x for his intuition that led me to the answers.

Community
  • 1
  • 1
JimParris
  • 61
  • 6
  • 1
    Can you provide any supporting references? Also, the example code makes a HEAD request. No document is returned to the browser and so it would seem there is no XML to parse. Correct? – Yogi Jan 27 '16 at 15:47
  • 2
    As far as the view you get in your browser window when openin an XML file, that’s just a convenience browser-UI rendering thing and in no way relates to what your JS code running in a browser does with XML programmatically when you give it a URL for an XML doc using XHR or `fetch()` or whatever. So as far as I can see, none of the info you’ve given in your answer here actually answers your question. The origin is most definitely not always *null* for XML documents in Chrome. So I don’t know what the cause of your problem actually is, but I’m pretty sure it’s not what you seem to think it is. – sideshowbarker Jan 28 '16 at 01:33
  • I kind of hope you are right, it is not a satisfying answer. You said, "in no way relates to what your JS code running in a browser does with XML programmatically when you give it a URL for an XML doc using XHR or fetch() or whatever." Hypothetically yes. This appears to not actually be the case. If you browse to an xml document with Chrome, as far as Http requests go, your origin is always null. Maybe this is because what you are viewing is actually the xml parser. Or maybe it's a bug. On "sources" tab, it is still ambiguous. – JimParris Jan 28 '16 at 10:21
  • @Roberto, we are not parsing yet in this example. Only trying to reach it. – JimParris Jan 28 '16 at 10:22
0

does your http://www.example.com/sitemap.xml is on a different domain than the domain your script is on ? If yes, the browser will be blocking the request for security reasons. You might want to look how to use CORS

Regular web pages can use the XMLHttpRequest object to send and receive data from remote servers, but they're limited by the same origin policy. Extensions aren't so limited. An extension can talk to remote servers outside of its origin, as long as it first requests cross-origin permissions.

You can read more about the same origin policy here

Jorel Amthor
  • 1,264
  • 13
  • 38
  • No, everything is on the same domain, hosted on the same (remote) server. To clarify, running this script from www.example.com/, or www.example.com/some-other-page/ works. But from www.example.com/sitemap.xml does not. Same domain, same web server. – JimParris Jan 27 '16 at 14:27