My website is having difficulties working with Facebook/LinkedIn/social crawlers. I think this may be due to the redirect headers it seems to be returning. Browsers can access the site completely fine, but crawlers of all kinds seem to only be able to access it after multiple attempts.
This is the (wordpress autogenerated) .htaccess I have running:
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
Every page on my website (including the index) seems to be returning a 302 Found status code, as opposed to the 200 I expect. I don't know if 302 is what I should actually be expecting, but the Facebook debugger is complaining that "URL requested a HTTP redirect, but it could not be followed" - but only on the first attempt.
Requesting the server root , curl -I
returns, on the first attempt:
HTTP/1.1 302 Found
Connection: close
Pragma: no-cache
cache-control: no-cache
Location: /
with nothing else. (Does the fact that the location is relative and not absolute cause a problem? Supposedly RFC 2616 does not require this any more.)
Subsequent attempts return:
HTTP/1.1 200 OK
Date: Thu, 18 Sep 2014 14:59:54 GMT
Server: XXXXX
X-Powered-By: XXXXX
Set-Cookie: PHPSESSID=XXXXX; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
X-Pingback: http://XXXXX/xmlrpc.php
Content-Type: text/html; charset=UTF-8
and the html follows as normal.
Is this to be expected? Why is the crawler not automatically following the redirect initially? Perhaps more strangely, why is the server returning this as a redirect at all?
For completeness, my DNS has an A record pointing to the IP of the dedicated server. I read that some DNS setups might cause issues like these, but I can't see why mine would?