0

I recently updated my website to enforce HTTPS for all requests. Everything appears to be working well after this change but Baidu's web crawler (Baiduspider) is receiving a 404 for all requests.

The website is running as an Azure website in standard mode with a SNI SSL binding. All other search engine crawlers are not experiencing this issue. Any ideas on how to resolve this is appreciated. Thanks.

Update: Below is an entry from my web server logs

2015-03-19 19:05:21 WWW-OMITTED-COM GET /OMITTED X-ARR-LOG-ID=d6e1b9e7-a035-4cdf-8792-89df3117fc69 80 - 107.184.16.197 Mozilla/5.0+(compatible;+Baiduspider/2.0;++http://www.baidu.com/search/spider.html) - - www.OMITTED.com 301 0 0 589 625 8 2015-03-19 19:05:23 WWW-OMITTED-COM GET /OMITTED X-ARR-LOG-ID=e01cdee8-5587-4c94-a733-ad3f258e6914 443 - 107.184.16.197 Mozilla/5.0+(compatible;+Baiduspider/2.0;++http://www.baidu.com/search/spider.html) - - www.OMITTED.com 200 0 0 39853 747 2626

HBCondo
  • 101
  • 2
  • Can you show an webserver log entry for two requests on the same resource, but different user agent (normal one and baidu spider)? Are you sure, the 404 comes from the webserver, not the web application running on the server? What happens, if you fake the user agent of your browser with baidus string? – sebix Mar 19 '15 at 12:46
  • Yes, I used Fiddler and composed a HTTP request with the baidu spider user agent string and it properly returns a 301 redirect to HTTPS and then a 200 from the web server. I edited the question to include my webserver logs showing the baidu spider accessing over HTTP but being redirected to HTTPS but I receive server errors (via elmah) that the requested ended with a 404 – HBCondo Mar 19 '15 at 19:18
  • 2
    It appears that [Baidu doesn't support SNI](https://www.mnot.net/blog/2014/05/09/if_you_can_read_this_youre_sniing). Why anyone doesn't at this late date is beyond me. If you don't have any significant number of visitors from China, I wouldn't worry about it. – Michael Hampton Mar 19 '15 at 19:23

0 Answers0