6

We have a React app that loads some data asynchronously from another domain. The requests are made using isomorphic-fetch in cors mode and the requests and responses all look fine and work correctly when testing using my own browser.

We have monitoring of the responses and log failures back to our application for analysis.

While most of the time all is well (and everything seems to be getting indexed correctly and showing up fine in Google) we still see a lot of failures, only for Googlebot, where it's failing to fetch the data correctly. Debugging the response object I see that the status is 200, but the statusText is empty. The response has no body (and so no .json or .text methods), and no headers (which shouldn't be the case) and the mode is correctly set as cors (not opaque, which might explain some of the other oddities).

From my understanding of CORS this all looks above board in terms of the headers being sent and received, so why is Googlebot having so many intermittent problems? Googlebot is saying that it has an HTTP 200 response (successful, the Promise is not rejected), but it's missing all the things that come with an HTTP 200 responose - it has no body and no headers exposed. Why is Googlebot failing to return a response with headers and a body (as described below)?

A normal preflight request looks like this (from Chome devtools) (extra slash in */\* added to stop SO thinking that it's a comment opener)

Accept:*/\*
Accept-Encoding:gzip, deflate, sdch, br
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Access-Control-Request-Headers:content-type, x-apikey
Access-Control-Request-Method:POST
Cache-Control:no-cache
Connection:keep-alive
DNT:1
Host:my.host.net
Origin:http://my.origin.net
Pragma:no-cache
Referer:http://my.origin.net/
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.100 Safari/537.36

And the preflight response looks like this

Access-Control-Allow-Headers:content-type,x-apikey
Access-Control-Allow-Origin:*
Cache-Control:no-cache
Connection:keep-alive
Content-Length:0
Date:Mon, 05 Dec 2016 00:55:05 GMT
Expires:-1
Pragma:no-cache
Server:Microsoft-IIS/8.5
X-AspNet-Version:4.0.30319
X-Powered-By:ASP.NET

Which is then followed up by the actual request which looks like this (sent as a POST with a JSON body)

accept:application/json
Accept-Encoding:gzip, deflate, br
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:no-cache
Connection:keep-alive
Content-Length:62
content-type:application/json
DNT:1
Host:someapi.net
Origin:http://my.origin.net
Pragma:no-cache
Referer:http://my.origin.net/
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like  Gecko) Chrome/54.0.2840.100 Safari/537.36
x-apikey:someapikey

Which returns a response like this (with a JSON body)

Access-Control-Allow-Origin:*
Cache-Control:no-cache
Connection:keep-alive
Content-Length:33576
Content-Type:application/json; charset=utf-8
Date:Mon, 05 Dec 2016 00:55:05 GMT
Expires:-1
Pragma:no-cache
Server:Microsoft-IIS/8.5
X-AspNet-Version:4.0.30319
X-Powered-By:ASP.NET
VLAZ
  • 26,331
  • 9
  • 49
  • 67
El Yobo
  • 14,823
  • 5
  • 60
  • 78
  • No, I had a copy paste issue when I was trying to get rid of SO thinking `*/*` was meant to open a comment in the code block. Removed again. – El Yobo Dec 05 '16 at 01:14
  • Not certain what issue is? – guest271314 Dec 05 '16 at 01:28
  • Googlebot is claiming that it has an HTTP 200 response, but it has no body and no headers. Testing elsewhere is unable to reproduce and the problem only happens with Googlebot. I'll edit to try to make this more clear. – El Yobo Dec 05 '16 at 01:35
  • `fetch()` returns a rejected `Promise` when a network error occurs. – guest271314 Dec 05 '16 at 01:43
  • Yes; this is not the problem here, it's not rejected (and it wouldn't have an HTTP 200 status if a network error occurred). I have edited the question to make that more clear, thanks. – El Yobo Dec 05 '16 at 01:49
  • _"Why is Googlebot failing?"_ What is expected result? – guest271314 Dec 05 '16 at 02:05
  • That it returns a response like I describe. I'm not sure how I can make that more explicit given the detail I've provided already. – El Yobo Dec 05 '16 at 03:00
  • @ElYobo this log that you mention, is purely frontend, right? Do you have access to the service that you are requesting? Is the body being sent? Maybe it is voided by the server for some reason. – Pablo Matias Gomez Dec 12 '16 at 17:56
  • @PabloMatiasGomez the errors are purely front end (we do load some data server side too). The service is developed in house, but by another area of the business. I've been unable to reproduce any problem hitting it manually, and it seems improbable that this problem would only affect Googlebot without affecting other users if that was the case, and also I have trouble envisaging an HTTP response that would cause such an error (no body and no *headers* either). – El Yobo Dec 13 '16 at 20:24
  • 1
    @ElYobo why don't you use [Google Webmaster Tools](https://www.google.com/webmasters/tools/home) to debug it? – Christos Lytras Dec 14 '16 at 01:03
  • @ChristosLytras the problem is intermittent; testing via the webmaster tools shows perfect rendering for me, we are only aware of the problem because we log client side errors back to the application and consolidate them for analysis in SumoLogic (along with other useful info, e.g. the user agent). From that analysis we can see that it only affects Googlebot and that it only happens occasionally (far less often than we get hit by Google). – El Yobo Dec 14 '16 at 02:56
  • Can you post the logs which show the problem behaviour? – stujo Dec 14 '16 at 04:05

1 Answers1

2

Check the IP address of the failing GoogleBot calls

It may be a nefarious actor, pretending to be google

Check the IP addresses as described here:

https://support.google.com/webmasters/answer/80553?hl=en

stujo
  • 2,089
  • 24
  • 29