I have a rails site using an AWS ALB and all routes appear to work except one, robots.txt.
I am getting the error "ERR_TOO_MANY_REDIRECTS", link to example: https://www.mamapedia.com/robots.txt
After some research I found many places that said the Load Balancer is sending traffic over HTTP to the EC2 instances, and the redirects can be caused when HTTPS traffic is hitting the load balancer aws docs. I have configured apache as described in the link and don't believe this is the issue, further all other routes work on the site on HTTP or HTTPS. Only robots.txt does not.
If I take an instance out of the load balancer and access it by IP, the robots.txt page is served as expected.
Strangely, if a trailing slash is added to the url https://www.mamapedia.com/robots.txt/ then the page will render. There are no wildcard redirects in Apache that should be adding a trailing slash, and again, outside the load balancer the robots.txt is accessible with out the trailing slash.
- Why would this trailing slash be required when the EC2 instance is behind an application load balancer?
- How can I configure it so the page loads without the trailing slash?
Httpd.config:
TraceEnable Off
ServerTokens Prod
ServerRoot "/etc/httpd"
PidFile run/httpd.pid
Timeout 600
KeepAlive On
MaxKeepAliveRequests 200
KeepAliveTimeout 600
User apache
Group apache
ServerAdmin support@mamapedia.com
UseCanonicalName Off
DirectoryIndex index.html index.html.var
AccessFileName .htaccess
<Files ~ "^\.ht">
Order allow,deny
Deny from all
</Files>
TypesConfig /etc/mime.types
<IfModule mod_mime_magic.c>
MIMEMagicFile conf/magic
</IfModule>
HostnameLookups Off
LogLevel crit
LogFormat "%a %{X-Forwarded-For}i %t %D %V \"%r\" %>s %b \"%{User-agent}i\"" detailed
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
ServerSignature Off
ServerTokens Prod
AddDefaultCharset UTF-8
AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz
AddHandler php5-script .php
AddType text/html .php
Listen 80
#Listen 443
Include conf.modules.d/*.conf
Include conf.d/*.conf
Edit Some more information: In AWS the load balancer has two listeners, one for http(port:80) and one for https(port:443). They each forward to a different target group, the http target group is configured for HTTP and port 80, while the https target group is configured for HTTPS and port 443
Then in Apache I have a Listener on port 80, seen in the linked file above. Also one of the conf.d/*.conf files for ssl config is listening on Port 443
I said earlier I didn't think this was an issue of the http -> https redirecting, but now I'm thinking this is misconfigured.
Edit 2 While trying to figure out this issue, new routes were set to point to the rails robots.txt file, for example the route /robots.img was used and that would render as expected. A few other file suffixes were used and all worked. It wasn't just .txt that was the issue, human.txt was tested as the route and it rendered the page as expected. This shows that the issue is specific to robots.txt
When I search my entire apache directory nothing hits for robots.txt, robots, and only one hit for txt in conf.d/autoindex.conf:
AddIcon /icons/text.gif .txt
The hit for txt is just setting an icon for txt files, but since other txt files work i.e. human.txt I don't think this is the issue.
How can only robots.txt be in an infinite redirect loop?