11

I have an airflow web server configured at EC2, it listens at port 8080.

I have an AWS ALB(application load balancer) in front of the EC2, listen at https 80 (facing internet) and instance target port is facing http 8080.

I cannot surf https://< airflow link > from browser because the airflow web server redirects me to http : //< airflow link >/admin, which the ALB does not listen at.

If I surf https://< airflow link > /admin/airflow/login?next=%2Fadmin%2F from browser, then I see the login page because this link does not redirect me.

My question is how to change airflow so that when surfing https://< airflow link > , airflow web server will redirect me to https:..., not http://..... so that AWS ALB can process the request.

I tried to change base_url of airflow.cfg from http://localhost:8080 to https://localhost:8080, according to the below answer, but I do not see any difference with my change....

Anyway, how to access https://< airflow link > from ALB?

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
user389955
  • 9,605
  • 14
  • 56
  • 98

7 Answers7

9

Since they're using Gunicorn - you can configure the forwarded_allow_ips value as an evironment variable instead of having to use an intermediary proxy like Nginx.

In my case I just set FORWARDED_ALLOW_IPS = * and it's working perfectly fine.

In ECS you can set this in the webserver task configuration if you're using one docker image for all the Airflow tasks (webserver, scheduler, worker, etc.).

Nathan Clayton
  • 160
  • 1
  • 6
4

Finally I found a solution myself.

I introduced a nginx reverse proxy between ALB and airflow web server: ie. https request ->ALB:443 ->nginx proxy: 80 ->web server:8080. I make the nginx proxy tell the airflow web server that the original request is https not http by adding a http header "X-Forwarded-Proto https".

The nginx server is co-located with the web server. and I set the config of it as /etc/nginx/sites-enabled/vhost1.conf (see below). Besides, I deletes the /etc/nginx/sites-enabled/default config file.

server {
    listen 80;
    server_name <domain>;
    index index.html index.htm;
    location / {
      proxy_pass_header Authorization;
      proxy_pass http://localhost:8080;
      proxy_set_header Host $host;
      proxy_set_header X-Real-IP $remote_addr;
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto https;
      proxy_http_version 1.1;
      proxy_redirect off;
      proxy_set_header Connection "";
      proxy_buffering off;
      client_max_body_size 0;
      proxy_read_timeout 36000s;
    }
}
user389955
  • 9,605
  • 14
  • 56
  • 98
  • Why do you make nginx proxy tell airflow web server that original request is https and not http? Is this necessary? – alex Sep 24 '20 at 21:51
4

User user389955 own solution is probably the best approach, but for anyone looking for a quick fix (or want a better idea on what is going on), this seems to be the culprit.

In the following file (python distro may differ):

/usr/local/lib/python3.5/dist-packages/gunicorn/config.py

The following section prevents forwarded for headers from anything other than local

class ForwardedAllowIPS(Setting):
    name = "forwarded_allow_ips"
    section = "Server Mechanics"
    cli = ["--forwarded-allow-ips"]
    meta = "STRING"
    validator = validate_string_to_list
    default = os.environ.get("FORWARDED_ALLOW_IPS", "127.0.0.1")
    desc = """\
        Front-end's IPs from which allowed to handle set secure headers.
        (comma separate).

        Set to ``*`` to disable checking of Front-end IPs (useful for setups
        where you don't know in advance the IP address of Front-end, but
        you still trust the environment).

        By default, the value of the ``FORWARDED_ALLOW_IPS`` environment
        variable. If it is not defined, the default is ``"127.0.0.1"``.
        """

Changing from 127.0.0.1 to specific IP's or * if IP's unknown will do the trick.

At this point, I haven't found a way to set this parameter from within airflow config itself. If I find a way, will update my answer.

Doug
  • 71
  • 7
1

We solved this problem in my team by adding an HTTP listener to our ALB that redirects all HTTP traffic to HTTPS (so we have an HTTP listener AND an HTTPS listener). Our Airflow webserver tasks still listen on port 80 for HTTP traffic, but this HTTP traffic is only in our VPC so we don't care. The connection from browser to the load balancer is always HTTPS or HTTP that gets rerouted to HTTPS and that's what matters.

Here is the Terraform code for the new listener:

resource "aws_lb_listener" "alb_http" {
  load_balancer_arn = aws_lb.lb.arn
  port              = 80
  protocol          = "HTTP"
  default_action {
    type = "redirect"
    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

Or if you're an AWS console kinda place here's how you set up the default action for the listener:

Console

0

I think you have everything working correctly. The redirect you are seeing is expected as the webserver is set to redirect from / to /admin. If you are using curl, you can pass the flag -L / --location to follow redirects and it should bring you to the list of DAGs.

Another good endpoint to test on is https://<airflow domain name>/health (with no trailing slash, or you'll get a 404!). It should return "The server is healthy!".

Be sure you have https:// in the base_url under the webserver section of your airflow config.

Daniel Huang
  • 6,238
  • 34
  • 33
  • Thanks Daniel, if I try to surf https:///admin directly from browser, I do get page! so the problem is due to airflow webserver redirect https:// to http:///admin, which my AWS ALB does not listen at. my current airflow.cfg have base_url = http : //localhost:8080. do you mean I should change it to base_url = https : //localhost:8080? I do not understand. my webserver is listening at http port 8080, it is the ALB in front of it who listens on https 80. – user389955 Jan 24 '18 at 20:53
  • I think base_url does not affect redirects and is only used for emails. – dstandish Jun 01 '21 at 08:31
0

Digging into the gunicorn documentation: it seems to be possible to pass any command line argument (when gunicorn command is called) via the GUNICORN_CMD_ARGS environment variable.

So what I'm trying out is setting GUNICORN_CMD_ARGS=--forwarded-allow-ips=* since all the traffic will come to my instance from the AWS ALB... I guess the wildcard could be replaced with the actual IP of the ALB as seen by the instance, but that'll be next step...

Since I'm running on ECS, I'm passing it as:

            - Name: GUNICORN_CMD_ARGS
              Value: --forwarded-allow-ips=*

in the Environment of my task's container definition.

PS: from the doc, this possibility was added as of gunicorn 19.7; for comparison, Airflow 1.10.9 seems to be on gunicorn 19.10 so good to go with any (more or less) recent version of Airflow !

bluu
  • 542
  • 3
  • 13
0

I encountered this issue too when using the official apache airflow helm chart (version 1.0.0).

Problem

Originally I had configured the webserver service with type LoadBalancer.

webserver:
  service:
    type: LoadBalancer
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: arn:aws:acm:us-west-2:1234512341234:certificate/231rc-r12c3h-1rch3-1rch3-rc1h3r-1r3ch
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp

This resulted in the creation of a classic elastic load balancer.

This mostly worked but when I clicked on the airflow logo (which links to https://my-domain.com), I'd get redirected to http://my-domain.com/home which failed because the load balancer was configured to use HTTPS only.

Solution

I resolved this by installing the AWS Load Balancer Controller on my EKS cluster and then configuring ingress.

The ingress-related portion of the chart config looks like this:

ingress:
  enabled: true
  web:
    host: my-airflow-address.com
    annotations:
      kubernetes.io/ingress.class: alb
      alb.ingress.kubernetes.io/subnets: subnet-01234,subnet-01235,subnet-01236
      alb.ingress.kubernetes.io/scheme: internal  # if in private subnets
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
webserver:
  service:
    type: NodePort

Notes

It might be possible to configure the webserver to use an ALB instead of classic ELB and configure it to handle the HTTP routing, but I have not tested it.

dstandish
  • 2,328
  • 18
  • 34