6

I am running a django app on AWS elastic beanstalk and I'm getting spammed by bots trying to scan for vulnerabilities. It results in a flood of errors such as:

(Where xx.xxx.xx.xx is my ec2 instance's ip address.)

DisallowedHost at //www/vendor/phpunit/phpunit/src/Util/PHP/eval-stdin.php
Invalid HTTP_HOST header: 'xx.xxx.xx.xx'. You may need to add 'xx.xxx.xxx.xx' to ALLOWED_HOSTS.

My legitimate users only access the site using the domain name. I've been trying to figure out how to modify my nginx configuration to block all connections that aren't addressed to *.mydomain.com or mydomain.com.

I dynamically add and remove subdomains as needed so I do a wildcard for the subdomain.

AWS Elastic beanstalk generates the following default config file for me:

/etc/nginx/nginx.conf

#Elastic Beanstalk Nginx Configuration File

user                    nginx;
error_log               /var/log/nginx/error.log warn;
pid                     /var/run/nginx.pid;
worker_processes        auto;
worker_rlimit_nofile    32788;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    include       conf.d/*.conf;

    map $http_upgrade $connection_upgrade {
        default     "upgrade";
    }

    server {
        listen        80 default_server;
        access_log    /var/log/nginx/access.log main;

        client_header_timeout 60;
        client_body_timeout   60;
        keepalive_timeout     60;
        gzip                  off;
        gzip_comp_level       4;
        gzip_types text/plain text/css application/json application/javascript $

        # Include the Elastic Beanstalk generated locations
        include conf.d/elasticbeanstalk/*.conf;
    }
}

Then I extend it with this file:

.platform\nginx\conf.d\elasticbeanstalk\00_application.conf

location / {
    set $redirect 0;
    if ($http_x_forwarded_proto != "https") {
        set $redirect 1;
    }
    if ($redirect = 1) {
        return 301 https://$host$request_uri;
    }   

    proxy_pass        http://127.0.0.1:8000;
    proxy_http_version  1.1;

    proxy_set_header    Connection         $connection_upgrade;
    proxy_set_header    Upgrade            $http_upgrade;
    proxy_set_header    Host               $host;
    proxy_set_header    X-Real-IP          $remote_addr;
    proxy_set_header    X-Forwarded-For    $proxy_add_x_forwarded_for;
    
    gzip on;
    gzip_comp_level 4;
    gzip_types text/html text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
    client_max_body_size 2000M;
}

location = /health-check.html {
    set $redirect 0;
    if ($http_x_forwarded_proto != "https") {
        set $redirect 1;
    }
    if ($http_user_agent ~* "ELB-HealthChecker") {
        set $redirect 0;
        return 204;
    }
    if ($redirect = 1) {
        return 301 https://$host$request_uri;
    }   

    proxy_pass        http://127.0.0.1:8000;
    proxy_http_version  1.1;

    proxy_set_header    Connection         $connection_upgrade;
    proxy_set_header    Upgrade            $http_upgrade;
    proxy_set_header    Host               $host;
    proxy_set_header    X-Real-IP          $remote_addr;
    proxy_set_header    X-Forwarded-For    $proxy_add_x_forwarded_for;

The purpose of that override file is so that nginx redirects http to https and responds to ELB health checks.

I'm not overly familiar with nginx or elastic beanstalk, but from what I could gather when researching this problem is that I need to have my default server connection return 444 and then have a separate server block with server_name set to my domain.

Is this the correct way to handle this problem and will it work with wildcard subdomains?

Thank you

Del
  • 163
  • 1
  • 4

3 Answers3

10

It seems like your only virtual host is the one with default_server attribute. It means that if no matching virtual host is found, that block is used to serve the request.

To properly handle your case, you need to have:

  1. server block with default_server in the listen directive. This block should only have return 404; or return 444;. You might want to turn off access_log in this block too.
  2. server block with server_name example.com *.example.com;. This virtual host should contain your actual application.

Note, this is how things are configured when one has complete control of nginx configuration. I don't know if Elastic Beanstalk has some features to automatically generate configuration files in this fashion.

Tero Kilkanen
  • 36,796
  • 3
  • 41
  • 63
2

The solution that allocates a server with the default_server directive for handling spam is quite incorrect.

Primarily it's incorrect because your website will not work with clients not supporting SNI. Those clients cannot negotiate the domain name requested over TLS connection, so NGINX will use whichever server you have configured as the default_server. Then, of course, they will fail to get anything, as they will land on your "spam handler" server.

While the prevalence of such clients maybe negligible nowadays, you don't want your spam handling to affect your genuine website visitors in any way.

I have covered the proper handling of invalid domains requested in the article "NGINX honeypot – the easiest and fastest way to block bots".

If you don't intend to implement the honeypot approach (trigger block in firewall upon invalid request), and just want to deny those requests, then it goes down to putting a map with all the valid domain names for your server, then including an if condition alongside your existing server which hasdefault_server:

In /etc/nginx/nginx.conf, setup a map listing all your website domain names:

map $http_host $default_host_match {
    example.com 1;
    *.example.com 1;
    default 0;
}

In the default server's configuration, block requests for invalid domains:

server {
    if ($default_host_match = 0) {
        return 410;
    }
    server_name example.com *.example.com default_server;

The 410 is my personal preference, you may want to use whatever you feel fits for your case. The 410 Gone is:

client error response code indicates that access to the target resource is no longer available at the origin server and that this condition is likely to be permanent.

Such an approach will work with SNI clients, because the selection of server block happens by NGINX after TLS connection has been established, and for checking the validity of the requested domain, we use the Host: HTTP header.

Handling bot traffic is essential and very recommended in regards to both security and performance. Why performance? Often times the requests cause your backend to be unnecessarily invoked, which uses CPU times and steals it away from valid clients.

Should the "honeypot" approach be implemented (block on firewall triggered by the first invalid request), no further invalid requests are possible from a given IP address. This reduces both an attack surface and the potential load caused by it otherwise.

Danila Vershinin
  • 5,286
  • 5
  • 17
  • 21
1

Tero's answer is both fantastic and correct, but I'd like to add some things.

Firstly, that you're hosting a service with a web server at all is signing you up for these attacks, if you allow connections from an IP you don't control, and have your domain published anywhere, discoverable, or guessable, you're going to receive these visitors.

Disallowing connections bearing an incorrect host is a great idea, but not necessarily for this particular security reason. Generally you do it so when you're setting up a new domain pointing to the same host, you don't inadvertently serve the wrong site on the new domain for a bit while you configure the web server.

Secondly, I would disagree that this is a "problem". The main reason to fear these attacks is if your software is out of date, with a secondary reason being zero days. The first reason shouldn't be a worry for any competent sysadmin, and neither vector can be prevented at all by only allowing recognised host headers.


As noted by TooTea in the comments, should the intent be to stop these spurious requests getting to an expensive backing service that has to figure out how to reject them, instead rejecting them at the cheap reverse proxy level, domain validation still isn't going to work. These crawlers will try their luck with their attacks because it's basically free for them to. As an example, my static HTML website constantly receives wordpress-focussed attacks, and my Django-based mortgage simulator is also being barraged with wordpress-focussed attacks. (Don't use wordpress).

A better solution is to define in your reverse proxy a set of paths/path patterns that are legal for a given service, and return 444 (or tarpit) requests that don't match the patterns. This is a good solution once it's set up, but does require some machinery in your service(s) that outputs the legal path patterns, and then something else that grabs that output on deployments, updates the reverse proxy configuration, and reloads the configuration.

Personally I wouldn't worry about doing something like that until you've calculated the wasted dollar cost of serving those requests with your backing service, compared to the actual dollar cost in developer time of researching/writing the machinery. ... Or if you're on your own time and you just wanna do it for fun, which I also respect.

Adam Barnes
  • 141
  • 4
  • 2
    My reading of the question is that OP is primarily bothered by the resulting spam of irrelevant error messages in the log. It's true that silencing them doesn't have any effect on the security of the site, but that doesn't mean it's not worth doing. Also, handling this on the (computationally cheap) reverse proxy is going to take some load off the (expensive) Python bits. – TooTea Aug 23 '22 at 17:26
  • If the intent is to stop the requests hitting the Python service then there's a better solution. I'll update my answer momentarily. – Adam Barnes Aug 23 '22 at 20:56