2

My site got hacked recently and has over 3 million pages now when it only has 30 pages (see screenshot).

How do I implement the correct 410 header in .htaccess?

I think the best tactic is to 410 all pages that contain a number OR .htm OR .html as none of the real pages have these in the URL. For example -

  • https://example.com/cixc-20050gsakuramar/-b00006.htm
  • https://example.com/sfumato.php?nzlw-21833vetidm4
  • https://example.com/bzmt-5694ceti.html
  • https://example.com/pfks-14602sjp/ucqksti.htm
  • https://example.com/admv-15974mitem/318

Would this code work?

Redirect 410 /*0*
Redirect 410 /*1*
Redirect 410 /*2*
Redirect 410 /*3*
Redirect 410 /*4*
Redirect 410 /*5*
Redirect 410 /*6*
Redirect 410 /*7*
Redirect 410 /*8*
Redirect 410 /*9*
Redirect 410 /*.html*
Redirect 410 /*.htm*

I've also pieced together a rewrite rule which might also work?

RewriteRule ^([0-9]+)$ - [G,L]

I am also thinking of adding Disallow to robots.txt like this -

Disallow: /*0*
Disallow: /*1*
Disallow: /*2*
Disallow: /*3*
Disallow: /*4*
Disallow: /*5*
Disallow: /*6*
Disallow: /*7*
Disallow: /*8*
Disallow: /*9*
Disallow: /*.htm
Disallow: /*.html

Screenshot

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93

1 Answers1

1

The redirect directive of mod_alias doesn't support wild cards. So your rules such as Redirect 410 /*0* would not do what you expect. You could make them into RedirectMatch directives which support regular expressions. I'd combine all the numbers into one rule, and html suffixes into another:

RedirectMatch Gone ".*[0-9].*" 
RedirectMatch Gone ".*\.html?$" 

From your Google Search Console screenshot, it looks like some of the URLS have query strings in them with a ?. mod_alias doesn't consult the query string at all when matching the URL. If the .html appears in the query string and not in the URL path, RedirectMatch won't be able to match it.

I'd recommend going with mod_rewrite rules which can match the query string. Another reason to prefer .htaccess would be if you have other rewrite rules in your .htaccess. Additional rewrite rules would be less likely to conflict than mod_alias rules.

I've added a condition to skip wp-content URLs because in the comments, you say you actually have some CSS files with numbers in them.

RewriteEngine on
RewriteCond %{REQUEST_URI} !^/?wp-content/
RewriteCond %{REQUEST_URI} !pagespeed
RewriteCond %{REQUEST_URI} !fontawesome
RewriteCond %{REQUEST_URI} !webfont
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule [0-9] - [G,L]
RewriteRule \.html?$ - [G,L]
RewriteCond %{QUERY_STRING} !v(er)?=
RewriteCond %{QUERY_STRING} [0-9]
RewriteRule . - [G,L]
RewriteCond %{QUERY_STRING} \.html?$
RewriteRule . - [G,L]

I wouldn't recommend using a Disallow in robots.txt because Google sometimes indexes disallowed URLs anyway even if it can't crawl them.

Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
  • Thanks for your help @Stephen Ostermiller. I've added what you recommended like this to .htaccess - ` RewriteEngine On RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}] RewriteBase / RewriteRule ^index\.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] RewriteRule [0-9] - [G,L] RewriteRule \.html?$ - [G,L] RewriteRule \.htm?$ - [G,L] ` – wheretheresawill Jul 27 '22 at 12:20
  • You'll need to put the `[G]` rules above your front controller rule. In other words they need to come before you `index.php` rule. – Stephen Ostermiller Jul 27 '22 at 12:26
  • Yes. You won't be able to remove based on numbers if you have actual CSS files with numbers like that. – Stephen Ostermiller Jul 27 '22 at 12:37
  • I added a rewrite condition that should except `wp-content` URLs. – Stephen Ostermiller Jul 27 '22 at 12:42
  • Hi @Stephen Ostermiller I added the new rewrite rule but it still 410's the CSS and JS :( – wheretheresawill Jul 27 '22 at 15:45
  • Try `RewriteCond %{REQUEST_FILENAME} !-f` instead of `RewriteCond %{REQUEST_URI} !^/?wp-content/` (that should prevent that rule from executing if a file exists on disk with that name.) – Stephen Ostermiller Jul 27 '22 at 17:07
  • Hi @Stephen Ostermiller the assets still 410. Here's a screenshot in devtools - https://i.ibb.co/YZXGVMs/410.jpg I will send you an email I don't mind paying for your help and maybe you can help with some other stuff! – wheretheresawill Jul 27 '22 at 17:33
  • It looks like you need to add more exceptions, or abandon your rule that shows 410 for any URL with a number. I edited in exceptions for `pagespeed` and `ver=` but you may find more. – Stephen Ostermiller Jul 27 '22 at 18:04
  • Hi @StephenOstermiller thanks so much for your help. Nearly there!! The only thing that its 410 now is fontawesome, icons aren't loading. I tried adding `RewriteCond %{REQUEST_URI} !fontawesome` and`RewriteCond %{REQUEST_URI} !webfont` but it didn't work. Any ideas how to fix please? Here's a screenshot - https://i.ibb.co/zGbZ9Zz/fa.jpg – wheretheresawill Jul 27 '22 at 18:36
  • Those are going to hit both of the number rules. They need the exceptions you put in plus an exception for `v=` on the query string. I put that into the `ver=` exception with `!v(er)?=` – Stephen Ostermiller Jul 27 '22 at 19:28