6

i am a newbie to Airflow. i have some .jar jobs generated with Talend Open Studio for Big Data, and i want to schedule and manage those with Airflow my question is , does Airflow support .jar file or generated by TOS as DAG ? and if it does how ? or is there any alternative to run .jar on Airlow ?

im using Airflow v1.10.3 the jobs are mainly to extract and process data from a mongodb database then update the database with the new processed data.

Thanks !

Yassin Abid
  • 315
  • 2
  • 5
  • 12

2 Answers2

4

Airflow does support running jar files. You do this through the BashOperator.

Quick example:

from airflow import DAG
from airflow.operators import BashOperator
from datetime import datetime
import os
import sys

args = {
  'owner': 'you', 
  'start_date': datetime(2019, 4, 24),
  'provide_context': True
}


dag = DAG(
    task_id = 'runjar', 
    schedule_interval = None, #manually triggered 
    default_args = args)

run_jar_task= BashOperator(
  task_id = 'runjar',
  dag = dag,
  bash_command = 'java -cp /path/to/your/jar.jar param1 param2'
  )
Zack
  • 2,296
  • 20
  • 28
  • But what if the jar file is connecting to a database? It would need a jdbc driver and how can i configure that in airflow? – user679530 Jan 21 '23 at 07:57
  • @user679530 This may be what you are looking for: https://airflow.apache.org/docs/apache-airflow-providers-jdbc/stable/operators.html – Zack Jan 22 '23 at 03:09
  • Hi Zack, but this is when if I want to write the code in airflow dag , correct? In my case I have this code written in java already. – user679530 Jan 27 '23 at 07:00
2

Airflow will happily run .jar files. There is a few examples kicking about for you to have a look at.

Running a standard .jar file: run_jar.py

Running a "built" Talend jobl loan_application_data.py

Obviously with both these examples the .jar or Talend file(s) will need to be on the server Airflow is executing on (as well as Java).

Tomme
  • 311
  • 1
  • 7
  • thank you for your answer @Tomme, when you said they need to be on Airflow server you mean on the .py file needs to be under `AIRFLOW_HOME/dags` ? – Yassin Abid May 02 '19 at 09:41
  • No worries. Yes you will need your DAG file in the `dags` folder, but I also mean the "worker" that is executing the task will need to be able to see the `.jar` file. If you are only running Airflow locally (`LocalExecutor`) then you don't need to worry, but if you plan on executing jobs remotely (`CeleryExecutor`/`KubernetesExecutor`) then this is something you will need to think about. – Tomme May 02 '19 at 10:10
  • I see thank you for the information ! right now i need to be able to execute my dags locally, but i still have some difficulties to import libraries generated by talend to execute my DAG (.jar) – Yassin Abid May 02 '19 at 10:33
  • @Tomme I ve seen your code, can u say how to manage libraries in this case? where should be located? – pm1359 May 12 '23 at 11:33