1

I have a rails site using an AWS ALB and all routes appear to work except one, robots.txt.

I am getting the error "ERR_TOO_MANY_REDIRECTS", link to example: https://www.mamapedia.com/robots.txt

After some research I found many places that said the Load Balancer is sending traffic over HTTP to the EC2 instances, and the redirects can be caused when HTTPS traffic is hitting the load balancer aws docs. I have configured apache as described in the link and don't believe this is the issue, further all other routes work on the site on HTTP or HTTPS. Only robots.txt does not.

If I take an instance out of the load balancer and access it by IP, the robots.txt page is served as expected.

Strangely, if a trailing slash is added to the url https://www.mamapedia.com/robots.txt/ then the page will render. There are no wildcard redirects in Apache that should be adding a trailing slash, and again, outside the load balancer the robots.txt is accessible with out the trailing slash.

  1. Why would this trailing slash be required when the EC2 instance is behind an application load balancer?
  2. How can I configure it so the page loads without the trailing slash?

Httpd.config:

TraceEnable Off
ServerTokens Prod
ServerRoot "/etc/httpd"
PidFile run/httpd.pid
Timeout 600
KeepAlive On
MaxKeepAliveRequests 200
KeepAliveTimeout 600

User apache
Group apache
ServerAdmin support@mamapedia.com
UseCanonicalName Off
DirectoryIndex index.html index.html.var
AccessFileName .htaccess
<Files ~ "^\.ht">
    Order allow,deny
    Deny from all
</Files>
TypesConfig /etc/mime.types

<IfModule mod_mime_magic.c>
    MIMEMagicFile conf/magic
</IfModule>
HostnameLookups Off
LogLevel crit
LogFormat "%a %{X-Forwarded-For}i %t %D %V \"%r\" %>s %b \"%{User-agent}i\"" detailed
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
ServerSignature Off
ServerTokens Prod
AddDefaultCharset UTF-8
AddType application/x-compress .Z
AddType application/x-gzip .gz .tgz
AddHandler php5-script .php
AddType text/html .php

Listen 80
#Listen 443

Include conf.modules.d/*.conf
Include conf.d/*.conf

Edit Some more information: In AWS the load balancer has two listeners, one for http(port:80) and one for https(port:443). They each forward to a different target group, the http target group is configured for HTTP and port 80, while the https target group is configured for HTTPS and port 443

Then in Apache I have a Listener on port 80, seen in the linked file above. Also one of the conf.d/*.conf files for ssl config is listening on Port 443

I said earlier I didn't think this was an issue of the http -> https redirecting, but now I'm thinking this is misconfigured.

Edit 2 While trying to figure out this issue, new routes were set to point to the rails robots.txt file, for example the route /robots.img was used and that would render as expected. A few other file suffixes were used and all worked. It wasn't just .txt that was the issue, human.txt was tested as the route and it rendered the page as expected. This shows that the issue is specific to robots.txt

When I search my entire apache directory nothing hits for robots.txt, robots, and only one hit for txt in conf.d/autoindex.conf:

AddIcon /icons/text.gif .txt

The hit for txt is just setting an icon for txt files, but since other txt files work i.e. human.txt I don't think this is the issue.

How can only robots.txt be in an infinite redirect loop?

  • the answer might be in the Apache HTTPd. Care to share its configuration? – danblack Sep 13 '18 at 22:14
  • @danblack I updated the question with the httpd, although I'm not sure the error would be there it may be in one of the included configuration files. Let me know if you see anything. Or if you think the issue is with the AWS configuration – 6557457iD9e Sep 14 '18 at 14:28

1 Answers1

0

A fairly typical cause of that infinite redirect loop is when you do SSL off-loading or SSL termination at either a load balancer or a CDN, which causes all traffic to the actual webserver to always be plain HTTP.

When you configure a redirect to HTTPS at the webserver, you get a situation like this:

1. Client ---> HTTP ----> load balancer ----> HTTP ----> Your server
                                                                 | 
                         <-------  Response: Redirect to HTTPS <- 

2. Client ---> HTTPS ----> load balancer ----> HTTP ----> Your server
                           does SSL off-loading                  |
                           or SSL termination                    |
                                                                 | 
                         <-------  Response: Redirect to HTTPS <-

3. Client ---> HTTPS ----> load balancer ----> HTTP ----> Your server
                                                                 | 
                         <-------  Response: Redirect to HTTPS <-

4. Client ---> HTTPS ----> load balancer ----> HTTP ----> Your server
                                                                 | 
                         <-------  Response: Redirect to HTTPS <-

5. Client ---> HTTPS ----> load balancer ----> HTTP ----> Your server
                                                                 | 
                         <-------  Response: Redirect to HTTPS <-
... ad infinitum 

The solution is:

  • don't redirect to HTTPS from your web server! Do that at the loadbalancer or from the CDN
  • if you can't do the redirect to HTTPS on the loadbalancer/CDN, then send traffic that arrives over http to a seperate backend server, and let that server do nothing else but redirecting to HTTPS and you avoid the loop and get something like:

    1. Client ---> HTTP  ----> load balancer ----> HTTP ----> Your redirect server
                                                                     | 
                             <-------  Response: Redirect to HTTPS <- 
    
    2. Client ---> HTTPS ----> load balancer ----> HTTP ----> Your application server
                                                                     | 
                             <-------  Response: Application data  <- 
    
  • possibly the loadbalancer/CDN sets a header with the original protocol, HTTP or HTTPS, that the client uses and use the presence/absence of that header as a condition to generate a redirect to HTTPS.


Also note: a HTTP 301 Redirect == "Moved Permanently" and as such even an incorrect configured redirect will be cached by web browsers (and maybe also CDN's and proxy servers) and after you have removed the directive from a server config you may still observe it. You may need to test from a new anonymous browser window and/or clear your caches.

HBruijn
  • 77,029
  • 24
  • 135
  • 201
  • Thanks, I have tried using the load balancer header and the same redirect issue happened. However, this is now isolated to just the /robots.txt path, I configured other paths to route exactly the same as robots.txt and they do not have the infinite redirect. Any idea why only robots.txt would behave this way? – 6557457iD9e Sep 19 '18 at 20:26