2

I posted a question one month ago with great answers (HTACCESS 403 : How to block URL with a specific character?) : HTACCESS 403 : How to block url with a specific character?

The problem is, I migrated my website HTTP to HTTPS and I would like to redirect all urls, except spammy urls whith a specific caracter that I would block with 410 code.

Exemple what I would like :

http://www.example.com/caterory/article-name/?vn/2022-06-24fivhg585.html ==> 410 code, without 301 to HTTPS
http://www.example.com/caterory/article-name/webhook.php?tw3fpage3rjnso530724 ==> 410 code, without 301 to HTTPS
http://www.example.com/caterory/article-name/football.php?fsmkfpagefgdg456 ==> 410 code,  without 301 to HTTPS

Wrong, today, the spammy urls have a 301 code, and then a 410 code

http://www.example.com/caterory/article-name/?vn/2022-06-24fivhg585.html ==> 301 to https://www.example.com/caterory/article-name/?vn/2022-06-24fivhg585.html and then ==> 410.
http://www.example.com/caterory/article-name/webhook.php?tw3fpage3rjnso530724 ==> 301 to
https://www.example.com/caterory/article-name/webhook.php?tw3fpage3rjnso530724 and then ==> 410.
http://www.example.com/caterory/article-name/football.php?fsmkfpagefgdg456 ==> 301 to
https://www.example.com/caterory/article-name/football.php?fsmkfpagefgdg456 and then ==> 410.

I'm using these rules :

RewriteEngine On
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^.*$ https://www.%1%{REQUEST_URI} [L,NE,R=301]

RewriteEngine On
RewriteCond %{QUERY_STRING} ^vn/ [NC]
RewriteRule ^ - [R=410]

RewriteEngine On
RewriteCond %{THE_REQUEST} /webhook.php [NC]
RewriteRule ^ - [R=410]

RewriteEngine On
RewriteCond %{THE_REQUEST} /football.php [NC]
RewriteRule ^ - [R=410]

Do you have an idea to manage the 301 redirection except URLs with a specific character / string pages.

MrWhite
  • 43,179
  • 8
  • 60
  • 84
Arnaud
  • 35
  • 5

1 Answers1

1

Just reverse the order of the rules, so your blocking directives are first (as they should be).

There is also no need to repeat the RewriteEngine directive.

Instead of using THE_REQUEST server variable (which is perhaps matching too much in the context you are using it), you should just use the RewriteRule pattern (or even combine the rules into one).

For example:

RewriteEngine On

# Blocking the following requests
RewriteCond %{QUERY_STRING} ^vn/ [NC]
RewriteRule ^ - [R=410]

RewriteRule /webhook\.php$ - [NC,R=410]

RewriteRule /football\.php$ - [NC,R=410]


# Canonical redirect
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://www.%1%{REQUEST_URI} [L,NE,R=301]

Note also that I simplified the regex ^.*$ in the last rule to just ^.

The 3 blocking rules can be combined into one (but does not really serve any benefit to do so). For example:

# Blocking the following requests (combined rule)
RewriteCond %{QUERY_STRING} ^vn/ [OR,NC]
RewriteCond %{REQUEST_URI} /webhook\.php$ [OR,NC]
RewriteCond %{REQUEST_URI} /football\.php$ [NC]
RewriteRule ^ - [G]

# Canonical redirect
:

NB: G (gone) is just shorthand for R=410.

As a general rule, the order of your directives should be:

  1. Blocking directives

  2. External redirects

  3. Internal rewrites

Wrong, today, the spammy urls have a 301 code, and then a 410 code

Although this doesn't really matter, except that it potentially uses a minuscule amount of additional resources. It's still ultimately a 410.

MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • Many thanks MrWhite ! I will try your rules :) But I think it could be nice to have also : https://www.example.com/caterory/article-name/webhook.php?tw3fpage3rjnso530724 to ==> 410 https://www.example.com/caterory/article-name/football.php?fsmkfpagefgdg456 to ==> 410 https://www.example.com/caterory/article-name/?vn/2022-06-24fivhg585.html to ==> 410 Is your rules support the wrong urls in https to 410 ? – Arnaud Jan 10 '23 at 11:54
  • 1
    @Arnaud The rules above would already do that. They don't check the requested _scheme_ (HTTP or HTTPS) so they naturally block both HTTP and HTTPS requests. – MrWhite Jan 10 '23 at 12:15
  • Ah OK, thank you ! I will delete my new answer ... Our messages crossed. Many, many thanks again, you made my day ! – Arnaud Jan 10 '23 at 12:18
  • 1
    @Arnaud You're welcome. If this answered your question then please mark it "accepted" (grey/green checkmark next to my answer above on the left below the voting arrows) to help other readers and remove the question from the unanswered question queue (you also get some "rep" for doing this). Please also consider upvoting answers you find helpful and to show gratitude. Thanks, much appreciated :) – MrWhite Jan 10 '23 at 12:26
  • I just tested the new rules but it seems not working : I have a 404 code and also a redirection to HTTPS. – Arnaud Jan 10 '23 at 12:53
  • Any idea how I can do ? I'm using Nginx and PHP 8.1.13 – Arnaud Jan 10 '23 at 13:14
  • 1
    @Arnaud "Nginx"?! `.htaccess` is an **Apache** config file and won't work on Nginx! (How was this "working" previously?) – MrWhite Jan 10 '23 at 13:49
  • I'm wrong, it's a FPM application served by Apache + Proxy mode : Nginx proxies requests to Apache + Enable nginx caching. Is that corrrect ? – Arnaud Jan 10 '23 at 13:59
  • 1
    @Arnaud OK, in that case, if your original rules were working as stated (ie. 301 followed by 410) then the revised rules in my answer should also "work". Are your rules exactly as I've stated above? And you have no other rules? (I'm now wondering why you used `THE_REQUEST` in your original rules - that would only have been necessary if you have a front-controller pattern or the URL is not as stated in the question?) – MrWhite Jan 10 '23 at 14:09
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/251008/discussion-between-arnaud-and-mrwhite). – Arnaud Jan 10 '23 at 14:20
  • I sent you a chat. About THE_REQUEST I just tested some ressources found on stackoverflow before asking my questions. Do you have a better solution ? Many many thanks ! – Arnaud Jan 10 '23 at 14:58
  • 1
    Hello @MrWhite I try a new time your code and it works fine ! Maybe I was tired ... – Arnaud Jan 16 '23 at 13:26