1

Really stumped, because form and syntax seem fine.

RewriteCond for REQUEST_URI is not matching the explicit path and filename. When isolated, RewriteCond for REQUEST_FILENAME matches just fine. I have verified using phpinfo() that REQUEST_URI contains the leading slash, and have tested without the leading slash, also.

The goal here is to know that the request is for this file and, if it doesn't exist, then throw a 410.

RewriteCond %{REQUEST_URI} ^/dir1/dir2/dir3/v_9991_0726dd5b5e8dd67a214c0c243436d131_all\.css$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ - [R=410,L]

I don't want to omit the first Cond, because I only want to do this for a handful of files similar to this one.

UPDATE I

trying to get a definitive test. Test set-up:

  • testmee.txt does not exist
  • request is for testmee.txt in the root
  • verified the request_uri is matching, by redirecting to google
  • cannot get 410 when using only first Cond
  • (when using only first Cond, server serves 404, not 410)
  • (using both Conds, server serves 404, not 410)
  • CAN get 410 when using only second Cond
RewriteCond %{REQUEST_URI} ^/testmee\.txt$
#RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ - [R=410,L]

versus

#RewriteCond %{REQUEST_URI} ^/testmee\.txt$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ - [R=410,L]

UPDATE II

Response for MrWhite:

ughh, same symptom. Might have to live with googlebot hitting 404s instead of a desired 410 for outdated css/js. No biggie in the long run, probably.

Thank you for that request_uri test redirect. Everything is working normally in those tests. Page names, etc. are returned as expected, in the var= rewrite URL.

At this point, I think it must be some internal handling of 404s related to the file type extensions. See clue below. I have Prestashop shopping cart software, and it must be forcing 404s on file types.

This will redirect to google (to affirm pattern match):

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^testmee\.txt$ http://www.google.com/ [L]
(L flag is needed or else other Rules further down will interfere.)

This will continue to return 404 instead of 410:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^testmee\.txt$ - [NC,R=410]

And as a control test, this will return a 410:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^.*$ - [NC,R=410]

If file type is css in the above failed test, then my custom 404 controller does not get invoked. I just get a plain 404 Response, w/o the custom 404 that is wrapped with all my site templating.

For example:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^testmee\.css$ - [NC,R=410]

I'm afraid I've wasted some of your time. My apologies. I never imagined that Prestashop's code would be forcing 404 based on file type, but I can't see any other explanation. I could dig into it and maybe find the spot in the Controllers that is doing it. Gotta take a break, though.

zzzaaabbb
  • 139
  • 1
  • 10
  • Do you have any other mod_rewrite directives in this `.htaccess` file? – MrWhite May 30 '19 at 23:14
  • Yes, many. It seemed that if I put this at the top, the Last flag would end processing for that trip through, return the 410, and not do an additional pass through htaccess because no substitution was made. If REQUEST_FILENAME matched fine and immediately worked in that manner, then REQUEST_URI (by itself with the RewriteRule) should also match, 410, stop, should it not? Even in isolation like that, REQUEST_URI won't match. One thing: if I hit a completely empty ordinary file (such as testme.txt in root), server routes it to 404 result. That might be related to this? – zzzaaabbb May 31 '19 at 00:06
  • @MrWhite ok, I think the server may be misconfigured. Not jumping to that conclusion recklessly or without testing. I've done some tests with text file in the root of the site, and I can get it to match with REQUEST_URI. Strange things are happening. With a match, I can redirect to, say, google.com. Then I change the result to be "throw a 410" and it throws a 404. Bluehost recently upgraded to apache 2.4.39. We might have some misconfiguration in there. – zzzaaabbb May 31 '19 at 00:23
  • `"throw a 301" it throws a 304` - When you do this, are you setting a _substitution_ string (which would be mandatory in this case) - you said it redirects to `google.com`, in that case it must be returning a 301/2/3? I'll throw down an answer... not really a solid answer, but things to try (too much for a comment). – MrWhite May 31 '19 at 00:28
  • yes, I understand what you mean about 301 needing a substitution string and not just a flag. I wanted to delete that comment, also, b/c it could have just been a correct not-modified result. I CAN get it to throw a 410 if I only use the second Cond w/ a non-existent filename, but cannot get a 410 with first Cond, when I know that it's matching (because got it to redirect to google with only a matched first Cond). The server is literally not wanting to throw a 410 with only a matched first Cond. Server seems to insist on 404 in that case. – zzzaaabbb May 31 '19 at 00:35

2 Answers2

1

This isn't really a solid answer, more of a things to try to help debug this and to quash some myths...

I have verified using phpinfo() that REQUEST_URI contains the leading slash

Yes, the REQUEST_URI Apache server variable does indeed contain the leading slash. It contains the full URL-path.

However, the REQUEST_URI Apache server variable is not necessarily the same as the $_SERVER['REQUEST_URI'] PHP superglobal - in fact, they aren't really the same thing at all. There are some significant differences between these variables (in some ways it's perhaps a bit unfortunate they share the same name). Notably, the PHP superglobal contains the initial URL from the request and includes the query string (if any) and is not %-decoded. Whereas the Apache server variable of the same name contains the rewritten URL (not necessarily the requested URL) and does not contain the query string and is %-decoded.

So, that's why I was asking whether you have other mod_rewrite directives. You could very well have had a conflict. If another directive rewrites the URL, then the condition will never match (despite the PHP superglobal suggesting that it should).

It seemed that if I put this at the top, the Last flag would end processing for that trip through, return the 410

This directive should certainly go at the top of the .htaccess file, to avoid the URL being rewritten earlier. The L flag is actually superfluous when used with a R=410 (anything other than a 3xx) - it is implied in this case.

Then I change the result to be "throw a 410" and it throws a 404.

That can certainly be caused by a server-side override. But you are able to throw a 410 in other situations, so that would seem to rule that out. However, you can reset the error document in .htaccess if in doubt (unless you are already using a custom error document):

ErrorDocument 410 default
RewriteCond %{REQUEST_URI} ^/dir1/dir2/dir3/v_9991_0726dd5b5e8dd67a214c0c243436d131_all\.css$
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ - [R=410,L]

Whilst this doesn't really make a difference to how the rule behaves, you don't need the first RewriteCond directive that checks against the REQUEST_URI. You should be doing this check in the RewriteRule pattern instead (which will be more efficient, since this is processed first). For example:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^dir1/dir2/dir3/v_9991_0726dd5b5e8dd67a214c0c243436d131_all\.css$ - [NC,R=410]

The NC flag should be superfluous.

Still, a conflict with existing directives is the most probable cause. Remove all other directives. Do you still see the same behaviour?


You can test the value of the REQUEST_URI server variable. You could either issue a redirect and pass the REQUEST_URI as a URL parameter, or set environment variables (but you will need to look out for REDIRECT_<var> for each rewrite).

For example, at the top of your .htaccess (or wherever you are trying this):

RewriteCond %{QUERY_STRING} ^$
RewriteRule ^ /test.php?var=%{REQUEST_URI} [NE,R,L]

Created a dummy test.php file to avoid an internal subrequest to an error document.

MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • Thank you for your time, effort and thoughtful answer. Yes, I do have a custom ErrorDocument directive and so don't want to tamper with that, really, especially just for this one small issue. Yes, let me test in the Rule for the incoming pattern match. That certainly may solve it. Let me study everything else you said, and also, have a look at the side-by-side test I did with certainty that request_uri was being matched. Strange that server balked at one method of getting 410 and not the other. Be right back. – zzzaaabbb May 31 '19 at 01:11
  • I updated my answer with a bit of code to test the value of an Apache variable. – MrWhite May 31 '19 at 01:11
  • MrWhite, results above. I'm afraid that I am stuck with the behavior, but on the upside, it's not that big of a deal. Thank you again for your time and knowledge. – zzzaaabbb May 31 '19 at 01:50
1

I was unable to determine why server configuration or site code was forcing '410 Gone' response directive in htaccess to be overridden with a 404 response, so had to do something like this to tell googlebot to stop hunting for CSS/JS files that get purged periodically (and renamed when regenerated).

in .htaccess:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule v_(.*)_(.*)$ /410response.php [L]

in 410response.php placed in root:

<?php header($_SERVER['SERVER_PROTOCOL'].' 410 Gone');

UPDATE I

The 404 response when attempting to use htaccess for the 410 directive was being forced by the server, because of server apparently having a custom 410 document, that apparently routed to 404. Adding a directive to prevent that then properly allowed use of htaccess to return 410 for pattern matches in RewriteRule. (I thought that I had already checked yesterday to see if this would work, since @MrWhite said in his answer above to control for server possibly having a custom 410; today when making this check, it did work and indicate that server 410-to-404 redirection was overridding my 410 directive.)

ErrorDocument 410 default
RewriteRule test\.txt$ - [NC,R=410]

MrWhite! I located this solution in one of your posts on Stack Exchange.

zzzaaabbb
  • 139
  • 1
  • 10
  • 1
    Thanks for the feedback. Just a minor point, the `RewriteCond ... !-d` directive in your rule block is probably superfluous, unless you also have directories that match the regex `v_(.*)_(.*)$`? Although you should make sure this regex is as specific as possible (eg. only match `.css` or `.js` requests - if that is the intention), in order to avoid unnecessary filesystem checks (which are relatively _expensive_). – MrWhite Jun 06 '19 at 23:06