When importing pywikibot
in a DAG file (or a module the DAG file imports), the DAG becomes broken, throwing the error in the webserver UI:
Broken DAG: [/path/to/airflow/dags/dag.py] encode() argument 1 must be str, not bool
I have tried to find a stack trace, but couldn't find anything by doing a search in airflow/logs
. Running airflow list_dags
runs successfully and doesn't help debugging the problem (as recommended by this question), even with --report
.
My question is thus: how can I use Pywikibot in a task of an Airflow DAG?
I've added additional information below to show what I've tried thus far. After we find the answer, this can be deleted to make the question more consise.
Here is the code for an example DAG:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import timedelta, datetime
#### this part is typically imported from another module ####
# to import, pywikibot requires `user-config.py` file or this env variable
import os
os.environ['PYWIKIBOT_NO_USER_CONFIG'] = '1'
import pywikibot
def do_nothing():
pass
#############################################################
dag = DAG('try_pywikibot', schedule_interval=timedelta(days=1))
default_args = {
'start_date': datetime(2019,1,1),
}
task1 = PythonOperator(
python_callable=do_nothing,
task_id=f'do_nothing',
dag=dag,
default_args=default_args,
)
Note on importing pywikibot
:
Pywikibot demands a config file, user-config.py
, in the working directory, unless a PYWIKIBOT_NO_USER_CONFIG
env variable is specified as 0
. Example of such a file:
family = 'wikipedia' # required
mylang = 'en' # required
# verbose_output = 0 # optional
I thought it might be because of pywikibot
's logging (sending a bool
?). Verbose logging can be turned off by adding verbose_output = 0
in pywikibot
's config file (user-config.py
), but this does not resolve the matter.
Oddly, running this simple script
import pywikibot
import airflow
with a user-config.py
file with verbose_output=0
still outputs verbose logs from pywikibot
. But, when not import Airflow, it runs through without any logging output. However, I've tried completely disabling logging from pywikibot
by modifying the library's logging.logoutput()
, which disables the logging even with Airflow imported, but the DAG is still regarded as broken by Airflow.
The DAG does "start" when manually triggering the DAG, but the tasks are never queued; they remain stuck in state None
.