A client wants to strengthen his main brand and consolidated the content of the various domains used for the respective sub-brands in subdirectories under the hood of his main domain (brand):
www.example.com
|_ .htaccess
|_ brand1
|_ files
|_ includes/header.inc
|_ scripts/functions.php
|_ brand2
|_ brand3
All hitherto separate domains now point at www.example.com
, in .htaccess
in root requests are rewritten to the resp. subdirectories (some domains are treated individually as domain/brand do not translate one to one, but the pattern is the same):
RewriteCond %{HTTP_HOST} ^(www.)?(brand1|brand2|brand3)\.com$ [NC]
RewriteRule ^ https://www.example.com/%2%{REQUEST_URI} [R=301,L]
These directives are followed by a RewriteBase /
directive and some rules treating incomplete file (not directory) names and passing images to a script for watermarking.
Generally this works fine. However, there are spurious, but too frequent to ignore, errors in Apache (2.4) & PHP (7.2) error logs whence PHP include
can't find the file or the watermark script can't load the image file.
Comparing these with the access log it turns out that in these cases requests came with a double slash:
"GET /brand1//path/to/file HTTP/1.1"
^^
where path/to/file
represents the root-based URI of respective former separate domain.
Most if not all of such requests come from search bots (Google, Yandex, Bing/MSN). Yet these same bots issue correct requests as well, nor can I replicate the double slash occurences in the browser for the very same files, with neither old (rewritten) nor new (current) URLs.
Obviously this can be remediated by RewriteRule ^\/?(.*)$ https://www.example.com/$1 [R=301,L]
(cf. https://stackoverflow.com/a/4278042 et al), but I rather would crush the error at its root if possible.
Thanks for any insight/suggestion/help.
EDIT:
In the case of above GET /brand1//path/to/file HTTP/1.1
Apache access log states return codes of either 200 or 500 (could not see a pattern) and 404 if the file is indeed missing. Intentionally entering an URL with double slashes in the browser displays the page up to the point of some include
.
For <?php include 'includes/header.inc' ?>;
the PHP error log says
PHP Warning: include(/home/http/htdocs/example/brand1/path/to/file/brand1/path/to/functions.php): failed to open stream: No such file or directory in /mnt/webnnn/htdocs/example/brand1/path/to/header.inc on line XX
/home/http/htdocs/example/brand1/path/to/file
includes header.inc
(so far it works; relative path) which in turn includes str_replace($_SERVER['SCRIPT_NAME'], '', $_SERVER['SCRIPT_FILENAME']) . '/brand1/path/to/functions.php'
.
(The hosting server returns a wrong path in $_SERVER['DOCUMENT_ROOT'], therefore the str_replace(...)
, which works fine unless that double slash occurs, breaking the match. So far I have tracked this down.)
Images passed in .htaccess
to watermarking script brand1/watermark.class.php
raise
PHP Warning: exif_imagetype(/path/to/image.jpg): failed to open stream: No such file or directory in /mnt/webnnn/htdocs/example/brand1/watermark.class.php on line XX
with the leading slash coinciding with the double slash in the Apache log (w/o these it correctly is imagecreatefromjpeg(path/to/image.jpg)
).
Meanwhile I revised error handling in watermark.class.php
, no longer relying on the -s
flag and other conditions in .htaccess
. Rewriting double-slashed URLs works, too, but still I have no idea from where these origin in the first place.