2

For reasons that aren't worth going into here, Google has been indexing one of my sites with unnecessary query strings in the URL which are wordfence_lh, hid and wordfence_logHuman. I'd like to modify my .htaccess file to remove all those query strings.

My URLs

example.com/page/111/?wordfence_lh=1&hid=CA2BA660BEFF26B9A17F8F85D7391BD4

example.com/page/80/?wordfence_logHuman=1&hid=647700EBF43600E7BC54103256F1D71B

Expected URLs

example.com/page/111/

example.com/page/80/

I've found a way to remove a single parameter, but I still can't find a regex or something to remove multiple query parameters. Any help is greatly appreciated, thanks so much!

Here's a part of my .htaccess file:

RewriteEngine On
RewriteBase /
RewriteCond %{HTTPS} on [OR]
RewriteCond %{SERVER_PORT} ^555$ [OR]
RewriteCond %{HTTP:X-Forwarded-Proto} https
RewriteRule .* - [E=WPR_SSL:-https]
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteRule .* - [E=WPR_ENC:_gzip]
RewriteCond %{REQUEST_METHOD} GET
RewriteCond %{QUERY_STRING} =""
RewriteCond %{HTTP:Cookie} !(wordpress_logged_in_.+|wp-postpass_|wptouch_switch_toggle|comment_author_|comment_author_email_) [NC]
RewriteCond %{REQUEST_URI} !^(/(.+/)?feed/?.+/?|/(?:.+/)?embed/|/(index\.php/)?wp\-json(/.*|$)|/cantonicalt/)$ [NC]
RewriteCond %{HTTP_USER_AGENT} !^(facebookexternalhit).* [NC]
RewriteCond "%{DOCUMENT_ROOT}/wp-content/cache/wp-rocket/%{HTTP_HOST}%{REQUEST_URI}/index%{ENV:WPR_SSL}%{ENV:WPR_WEBP}.html%{ENV:WPR_ENC}" -f
RewriteRule .* "/wp-content/cache/wp-rocket/%{HTTP_HOST}%{REQUEST_URI}/index%{ENV:WPR_SSL}%{ENV:WPR_WEBP}.html%{ENV:WPR_ENC}" [L]
</IfModule>
MrWhite
  • 43,179
  • 8
  • 60
  • 84
Tran Duc Nam
  • 23
  • 1
  • 4
  • 1
    Welcome to SO, please do share your htaccess file in your question, thank you. – RavinderSingh13 Feb 01 '21 at 03:58
  • @RavinderSingh13 I have just updated my current htaccess code. – Tran Duc Nam Feb 01 '21 at 04:41
  • More information: The wordfence plugin is what is caused this situation. I uninstalled it but these URL parameters do not disappear! – Tran Duc Nam Feb 01 '21 at 04:49
  • Ok so you are hitting urls `example.com.com/page/111/` in browser and you want it to be served by `example.com/page/111/?wordfence_lh=1&hid=CA2BA660BEFF26B9A17F8F85D7391BD4`? – RavinderSingh13 Feb 01 '21 at 05:55
  • @RavinderSingh13 No I just want to remove the part `?wordfence_lh=1&hid=CA2BA660BEFF26B9A17F8F85D7391BD4` in all of my URL. Google bot crawled and indexed them making me duplicated content. In other words I want to get rid of these parameters which are wordfence_lh, hid and wordfence_logHuman. The URL should be `example.com.com/page/111/` – Tran Duc Nam Feb 01 '21 at 06:03
  • Do you have any other URL parameters on these URLs that you want to preserve? Or is it sufficient to simply remove the entire query string if any of there URL params are present (as in your example)? – MrWhite Feb 01 '21 at 18:06
  • @MrWhite I did not see any other than 3 URL parameters wordfence_lh, hid and wordfence_logHuman. I want to remove them But I still don't want to interfere with other measurement tools like google analytics. – Tran Duc Nam Feb 03 '21 at 07:34
  • Is `/page` literal text in the URL, ie "page". Or is that meant to represent a variable URL-path? – MrWhite Feb 03 '21 at 11:19

2 Answers2

0

It's simple SEO. You need block GET request in robots.txt Just add its in robots.txt

Disallow: /?

and all GET url be block for index (be noindex). It will take some time for Google to remove trash pages from search.

You can block any trash pages, just add

Disallow: *?s=
Disallow: *&s=
Nazar
  • 11
  • 5
  • As google said in their guide:"You should not use robots.txt as a means to hide your web pages from Google Search results. This is because, if other pages point to your page with descriptive text, your page could still be indexed without visiting the page. That's why I don't want to use this method. – Tran Duc Nam Feb 01 '21 at 08:15
  • Yes, robots.txt it's recommendation for google bot, but it's works in 90% cases. You can additionally add meta tag `` so that get 100% block page for index – Nazar Feb 01 '21 at 08:36
  • Since these URLs are already indexed by Google, you should NOT block access with `robots.txt` - this won't drop the URLs quickly from the Google index. – MrWhite Feb 01 '21 at 18:04
  • @MrWhite Totally agreed with you. That is not solution at all. – Tran Duc Nam Feb 03 '21 at 07:32
0

I did not see any other than 3 URL parameters wordfence_lh, hid and wordfence_logHuman. I want to remove them

If you don't have any other URL parameters on any other URLs then it would be simplest to to just remove the entire query string if any query string is present. For example:

# Remove any query string on all URLs
RewriteCond %{QUERY_STRING} .
RewriteRule ^ %{REQUEST_URI} [QSD,R=301,L]

This needs to go at the top of the .htaccess file, before your existing directives.

The RewriteCond directive checks for the presence of any query string. The QSD flag discards the query string from the redirect response.

However, if you have other URL parameters on other URLs, that need to be preserved then check for these specific URL parameters (as first suggested) and then remove the entire query string if any of these URL parameters are present. For example:

# Remove the entire query string if any one of the URL params are present
RewriteCond %{QUERY_STRING} (&|^)(wordfence_lh|hid|wordfence_logHuman)=
RewriteRule ^ %{REQUEST_URI} [QSD,R=301,L]

But I still don't want to interfere with other measurement tools like google analytics.

This isn't a problem unless you are using URL parameters on other URLs and these are sometimes mixed with the URL parameters you want to remove?


UPDATE:

Recently I have just tested with... Is it the same with your 2nd code? What is the difference?

RewriteCond %{QUERY_STRING} ^(.*)&?wordfence_lh=[^&]+&?(.*)$ [NC]
RewriteRule ^/?(.*)$ /$1?%1%2 [R=301,L]

RewriteCond %{QUERY_STRING} ^(.*)&?wordfence_logHuman=[^&]+&?(.*)$ [NC]
RewriteRule ^/?(.*)$ /$1?%1%2 [R=301,L]

RewriteCond %{QUERY_STRING} ^(.*)&?hid=[^&]+&?(.*)$ [NC]
RewriteRule ^/?(.*)$ /$1?%1%2 [R=301,L]

No, it's not "the same". It is "attempting" to preserve URL parameters that are mixed with the URL parameters you are wanting to remove (as mentioned in my last sentence above) - which does not appear to be a requirement for you.

However, there are a couple of issues with these directives:

  1. It is matching too much and could potentially corrupt the query string. For example, it doesn't just match hid=, it would also match foohid= and will then preserve the foo part which would potentially "break" the query string. eg. Given a query string like foohid=123&bar=1, the above directive would redirect to foobar=1 which is obviously not correct.

  2. This series of 3 rules potentially triggers 3 external redirects, since a separate redirect is triggered for each occurrence of a URL parameter you want to remove. This should (and can) be avoided. In your example URLs (that contain just two of these URL params), you would get two redirects. Two redirects isn't necessarily too bad, however, it could be reduced to a single redirect (worst case).

MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • Thanks, your 2nd option seems the answer for me. Recently I have just tested with: `RewriteCond %{QUERY_STRING} ^(.*)&?wordfence_lh=[^&]+&?(.*)$ [NC] RewriteRule ^/?(.*)$ /$1?%1%2 [R=301,L]` `RewriteCond %{QUERY_STRING} ^(.*)&?wordfence_logHuman=[^&]+&?(.*)$ [NC] RewriteRule ^/?(.*)$ /$1?%1%2 [R=301,L]` `RewriteCond %{QUERY_STRING} ^(.*)&?hid=[^&]+&?(.*)$ [NC] RewriteRule ^/?(.*)$ /$1?%1%2 [R=301,L]` Is it the same with your 2nd code? What is the difference? – Tran Duc Nam Feb 04 '21 at 10:34
  • That code is not the same and is arguably incorrect. I've updated my answer with an explanation. – MrWhite Feb 04 '21 at 13:40
  • You're welcome. If this answered your question then please mark it as "accepted" by clicking the tick/checkmark next to the answer (to help other readers and to remove the question from the unanswered question queue). Once you have 15 rep you can also upvote answers that you find helpful. Thanks, much appreciated. :) – MrWhite Feb 04 '21 at 13:42
  • Yep I always wonder if it generates too many triggered redirects when I use my own code. In general your code is better and it solves all mentioned problems right? – Tran Duc Nam Feb 04 '21 at 16:35