0

I recently added a few new DAGs to production airflow and as a result decided to scale up the number of nodes in the Composer pool. After doing so I got the error: Can't decrypt _val for key=<KEY>, invalid token or value. This happens now for every single DAG that uses variables. It's not the same key either, it depends on what variables the DAG needs.

I immediately scaled Composer back down to 3 nodes and the problem persisted.

I have tried re-saving all of the Variables, recreating them in the UI (which says they are all valid), recreating them in the CLI (which lists invalid for every single one).

I have also tried updating configuration to try and reboot the server, and manually stopping the VM instances.

Composer also seems to negate the ability to update the Fernet Key, so I can't try and use a new one. For some reason it appears that the permanent one Composer has assigned is now invalid.

Is there anything else that can be done to remedy that problem short of recreating the environment?

KJS
  • 42
  • 7
  • I've never used Composer but is there a way to view the scheduler logs separate from the webserver, and if so does the error happen in both? – joebeeson Nov 14 '18 at 02:11
  • @joeb Stackdriver has the logs split for the scheduler, it unfortunately throws the same error as it tries to load the DAGs. Nothing has changed this morning, I think my best course of action is to recreate the environment. – KJS Nov 14 '18 at 16:48

1 Answers1

0

I managed to fix this problem by adding a new python package. It seems that adding a package is the only way to really "reboot" the environment. The reboot invalidated all of my variables and connections when it had finished but I was able to just add those back in rather than having to recreate the entire environment.

Heard back about this issue: According to Google, Composer creates a custom image for the environment and passes one to each node, and if that got corrupted during scaling then the only way to fix it is by adding a new python package so it rebuilds the image. Incidentally, version 1.3.0 of Composer is much better as the scheduler is restarted every 10 minutes which should solve some of the latter issues I experienced.

KJS
  • 42
  • 7
  • What version of Composer did you see this problem on? And did you attempt to override the fernet key manually (not recommended, just curious)? – Trevor Edwards Nov 14 '18 at 19:28
  • @TrevorEdwards Yeah I did, I'm using the `composer-1.1.1-airflow-1.9.0` image. It didn't work though, in Google's docs they mandate that it is permanently set upon environment creation and that they manage the key. Trying to override the configuration or the environment variable for the fernet key is denied. – KJS Nov 14 '18 at 20:54