0

Here is a strange one for you. We have a server with multiple VHOSTS that include both SSL and Non-SSL domains.

Domain1 is SSL enabled, while Domain2 doesn't have SSL.

Since all these domains are hosted on the same IP, apache would respond to httpS requests on domain2 by loading the first SSL enabled vhost, so basically if you went to httpS :// domain2 browser would warn you against an SSL mismatch which would require a user to click on advanced settings prior to seeing the content of Domain1 (the first SSL domain of Apache)

1) If chrome is smart enough to understand there is SSL mismatch, why the heck would Google still index content of Domain1 under https://domain2.com

2) We have since fixed the issue by a putting a re-write that shows 404 for all pages of httpS // domain2.com, We have also used Google Webmaster tools to remove all entries of httpS // domain2.com, however, these keep coming back every 4-6 weeks! I went as far as using Google's fetch URL tool to make sure httpS // domain2.com results into 404 from their point of view and indeed it does.

How the heck is Google still finding content of Domain1 under httpS // Domain2.com? Are they relying on Caches even after a removal request?

All I can think is that Google has the content cached locally and they keep using that content to create indexes again; meaning once we manually request removal of content, they do not crawl the site to re-create that index but they rely on their own local cached copy.

Gerald Schneider
  • 23,274
  • 8
  • 57
  • 89
mamad
  • 1
  • 1
  • It takes some time. Google may have other priorities than fulfiling your request on a single domain. Meanwhile, you could adjust your server a little. `404` isn't probably a best solution; you could use `302` _moved permanently_ to show Google that the canonical address should be http. But realize that Google uses [HTTPS as a ranking signal](https://security.googleblog.com/2014/08/https-as-ranking-signal_6.html?m=1). – Esa Jokinen May 25 '17 at 17:50
  • We had a 302 redirect on https to http, while that was working in browser but the Google index kept coming back, hence why we changed to 404 as a failsafe. – mamad May 25 '17 at 17:53
  • Now it may remove also http results because it doesn't realize that there might be different content on the other protocol, but probably that doesn't matter to you – Esa Jokinen May 25 '17 at 17:56
  • 1
    You could also get a certificate for every domain, implement SNI and abandon HTTP. That would be what professionals do. – Esa Jokinen May 25 '17 at 17:58
  • We are actually in the process of issuing SSL certs for all domains and abandoning HTTP. The issue remains the same. The content of domain1 are in google index for domain2 and they keep coming back. – mamad May 25 '17 at 18:00
  • That's good to hear! It will fade away when you have finished the process. – Esa Jokinen May 25 '17 at 18:01
  • 1
    302 redirects are not useful for updating search engines (use 301s to do that). Also, it sounds like you may need to implement a default virtualhost (ie. the first VirtualHost encountered on that port.) use `httpd -S` to find out which one Apache will use. Also, if this is Apache httpd 2.2, make sure you have `NameVirtualHost :443` – Cameron Kerr May 26 '17 at 07:58
  • Google's documentation says to use 404 for permanent removal. We are also using NameVirtualHost and have verified the pages result in 404. Somehow they get indexed by google again! beats the hell out of me. Read my last paragraph, thats the only logical explanation. – mamad May 26 '17 at 23:15

0 Answers0