2

I am trying to run a containerized application of Airflow and Spark using the following repository, https://github.com/cordon-thiago/airflow-spark

As given in the steps here, I need to edit the spark_default connection for my DAGs to be submitted to Spark, however, I cannot seem to do that. This is what I see when I try doing it,

                          ____/ (  (    )   )  \___
                         /( (  (  )   _    ))  )   )\
                       ((     (   )(    )  )   (   )  )
                     ((/  ( _(   )   (   _) ) (  () )  )
                    ( (  ( (_)   ((    (   )  .((_ ) .  )_
                   ( (  )    (      (  )    )   ) . ) (   )
                  (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
                  ( (  (   ) (  )   (  ))     ) _)(   )  )  )
                 ( (  ( \ ) (    (_  ( ) ( )  )   ) )  )) ( )
                  (  (   (  (   (_ ( ) ( _    )  ) (  )  )   )
                 ( (  ( (  (  )     (_  )  ) )  _)   ) _( ( )
                  ((  (   )(    (     _    )   _) _(_ (  (_ )
                   (_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
                   ((__)        \\||lll|l||///          \_))
                            (   /(/ (  )  ) )\   )
                          (    ( ( ( | | ) ) )\   )
                           (   /(| / ( )) ) ) )) )
                         (     ( ((((_(|)_)))))     )
                          (      ||\(|(|)|/||     )
                        (        |(||(||)||||        )
                          (     //|/l|||)|\\ \     )
                        (/ / //  /|//||||\\  \ \  \ _)
-------------------------------------------------------------------------------
Node: 5fce0b10ba4b
-------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 69, in inner
    return self._run_view(f, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 368, in _run_view
    return fn(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/flask_admin/model/base.py", line 2138, in edit_view
    form = self.edit_form(obj=model)
  File "/usr/local/lib/python3.6/site-packages/flask_admin/model/base.py", line 1340, in edit_form
    return self._edit_form_class(get_form_data(), obj=obj)
  File "/usr/local/lib/python3.6/site-packages/wtforms/form.py", line 208, in __call__
    return type.__call__(cls, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/flask_admin/form/__init__.py", line 16, in __init__
    super(BaseForm, self).__init__(formdata=formdata, obj=obj, prefix=prefix, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/wtforms/form.py", line 274, in __init__
    self.process(formdata, obj, data=data, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/wtforms/form.py", line 126, in process
    if obj is not None and hasattr(obj, name):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 356, in __get__
    retval = self.descriptor.__get__(instance, owner)
  File "/usr/local/lib/python3.6/site-packages/airflow/models/connection.py", line 212, in get_extra
    return fernet.decrypt(bytes(self._extra, 'utf-8')).decode()
  File "/usr/local/lib/python3.6/site-packages/cryptography/fernet.py", line 199, in decrypt
    raise InvalidToken
cryptography.fernet.InvalidToken

I do not really know what a FERNET_KEY is and how it applies here. How exactly can I set this up so that my Spark operations will run?

UDPATE

Under the Configuration tab in my Airflow UI, I seem to have the fernet_key configured, enter image description here

From what I can see, this is generated through the following command,

: "${AIRFLOW__CORE__FERNET_KEY:=${FERNET_KEY:=$(python -c "from cryptography.fernet import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)")}}"

All of the variables are then exported,

export \
  AIRFLOW__CELERY__BROKER_URL \
  AIRFLOW__CELERY__RESULT_BACKEND \
  AIRFLOW__CORE__EXECUTOR \
  AIRFLOW__CORE__FERNET_KEY \
  AIRFLOW__CORE__LOAD_EXAMPLES \
  AIRFLOW__CORE__SQL_ALCHEMY_CONN \

This seems to be in line with what is available in the documentation. What is the problem here?

Minura Punchihewa
  • 1,498
  • 1
  • 12
  • 35

2 Answers2

2

You need to generate a new fernet key and add it to your airflow config. https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/fernet.html

As the link suggests fernet key is what is used by airflow to encrypt passwords that are stored in the connection information. In the above case, the fernet key was never set hence the error - Invalid token.

Fernet is an implementation of symmetric encryption. But that would be way out of the scope of this issue!

floating_hammer
  • 409
  • 3
  • 10
  • How exactly do I do this? And where? I see this in my airflow.cfg, fernet_key = $FERNET_KEY – Minura Punchihewa Oct 26 '21 at 11:24
  • Does not really matter where you generate this fernet key. Generate it on the machine on which you are planning to run your docker container. – floating_hammer Oct 26 '21 at 12:13
  • @floating_hammer Would be great if you expand your answer with a quote from the site you're referring to. Because URLs change... – mazaneicha Oct 26 '21 at 13:24
  • Is it something that I need to generate once? Also, I can understand how it is to be generated, but what I want to know is, what do I assign it to? Is there some variable in the cfg file that I need to assign it to or something? – Minura Punchihewa Oct 26 '21 at 17:42
  • Airflow URLs are stable and they got redirects when they change, so no worry about that @mazaneicha. – Jarek Potiuk Oct 26 '21 at 18:40
  • Yes once. It looks like in your deployment it is somewhat wrongly configured. Usually you do not specify those variables with $VARIABLE NAME.Seems you might have some rather old config files (wild guess). I suggest that you take a look at configuration documentation here: https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html All the different ways of how you can set Airflow configuration are described there. Specifically Fernet key config is listed here: https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#fernet-key – Jarek Potiuk Oct 26 '21 at 18:42
  • I honestly cannot find anything useful in any of those pages. I still cannot edit my connection. Is there any other way to do it? Or are there any other steps that I could follow? – Minura Punchihewa Oct 27 '21 at 04:37
  • I made an update to the question. Please take a look. – Minura Punchihewa Oct 27 '21 at 06:10
0

The quick fix for this is to just delete the existing connection in the UI and then create a new one with the required parameters.

A better way would be to use airflow.models.Connection along with airflow.settings to manage these connection programmatically.

This is explained in detail here, Is there a way to create/modify connections through Airflow API

The code snippet given below has been derived from an answer for the above. It can be used to check if a connection exists and delete it if available, and then create a new connection using the given connection details,

def create_conn(conn_id, conn_type, host, login, password, port):
    conn = Connection(
        conn_id=conn_id,
        conn_type=conn_type,
        host=host,
        login=login,
        password=password,
        port=port
    )
    session = settings.Session()
    conn_name = session\
    .query(Connection)\
    .filter(Connection.conn_id == conn.conn_id)\
    .first()

    if str(conn_name) == str(conn_id):
        print(help(session))
        session.delete(conn_name)
        logging.info(f"Connection {conn_id} already exists")

    session.add(conn)
    session.commit()
    logging.info(Connection.log_info(conn))
    logging.info(f'Connection {conn_id} is created') 

To create a new connection to Spark using the parameters we want,

create_conn(conn_id='spark_default', conn_type='', host='spark://spark', login='', password='', port='7077')
Minura Punchihewa
  • 1,498
  • 1
  • 12
  • 35