How to deploy scheduled Kettle jobs on Pentaho BI server v6 CE

Question

I have a server running Pentaho BI server v6 Community Edition. We've developed a Kettle job to extract from one database to another, exported as a KJB file. I would like to run this job every 12 or so hours.

I noticed that the BI server already included Kettle, and has the ability to upload and schedule jobs. Do I need to install the DI server if the BI server already has Kettle installed?

If not, how can I publish the KJB file into the BI server? I'd like to use a file system repository. If I upload the file directly through the user console the log shows that the import was a success, but I cannot select or run the job anywhere.

mzy · Accepted Answer · 2017-04-12T12:34:15.350

I use Pentaho BI server 5, but it should work same on Pentaho BI 6.

My Kettle job runs many sub-transformations. Transformation files are stored on file system directory e.g. /opt/etl.

So lets say I have one job (daily_job.kjb) with two sub-transformations.

To run a Kettle job on Pentaho BI CE I use those steps:

set up a transformation location properly in job file
upload sub-transformations to proper directory on server (/opt/etl)
create xaction file which executes Kettle job on BI server (daily.xaction)
upload daily.xaction and daily_job.kjb files to Pentaho BI server (same folder)
schedule daily.xaction file on Pentaho BI server

Job settings in daily_job.kjb:

Xaction code daily.xaction (simply it executes daily_job.kjb located in same folder in BI server as where xaction is):

<?xml version="1.0" encoding="UTF-8"?>
<action-sequence> 
  <title>My scheduled job</title>
  <version>1</version>
  <logging-level>ERROR</logging-level>
  <documentation> 
    <author>mzy</author>  
    <description>Sequence for running daily job.</description>  
    <help/>  
    <result-type/>  
    <icon/> 
  </documentation>

  <inputs> 
  </inputs>

  <outputs> 
    <logResult type="string">
      <destinations>
        <response>content</response>
      </destinations>
    </logResult>
  </outputs>

  <resources>
    <job-file>
      <solution-file> 
        <location>daily_job.kjb</location>  
        <mime-type>text/xml</mime-type> 
      </solution-file>     
    </job-file>
  </resources>

  <actions> 
    <action-definition>
      <component-name>KettleComponent</component-name>
      <action-type>Pentaho Data Integration Job</action-type>
      <action-inputs>   
      </action-inputs>
      <action-resources>
        <job-file type="resource"/>
      </action-resources>
      <action-outputs> 
        <kettle-execution-log type="string" mapping="logResult"/>  
        <kettle-execution-status type="string" mapping="statusResult"/> 
      </action-outputs>   
      <component-definition>
        <kettle-logging-level><![CDATA[info]]></kettle-logging-level>           
      </component-definition>
    </action-definition>

  </actions> 
</action-sequence>

Scheduling Kettle job (xaction file) on Pentaho BI CE:

Worked!! In order to view uploaded jobs, click View > Show hidden files. — JRichardsz, Dec 12 '17 at 01:53
If I want to upload to pentaho server also the transformation, in the same directory of the files .kjb and .xaction, how I have to configure the path location of the subtransformation in the .kjb file? Also: if the output of the transformation is a file, it is possible to publish it automatically in pentaho server folders by setting up a proper output path location? I ask this because in terms of security it is not a best praticse to read and write files in a folder out of tomcat directories. — fl4l, May 24 '18 at 08:54

score 1 · Answer 2 · answered Mar 11 '19 at 07:08

Proceed with the following steps:

Login to pentaho console after starting the pentaho bi server as Administrator:

Click on Browse Files button and a new page will open. In this page, Select a folder under Folders section and then click upload in the right side pane.

Select a file and click ok.

Now refresh the page and then the file will be getting reflected in your respective folder.
Now to schedule the job. Click you respective folder in left pane, select your main job file in middle pane and then click on Schedule in the right pane.
In the new pop up, select you generated file path and click next. Select the recurrence schedule, job time, and job start date.

Select yes in the next pop and you will be redirected to Manage Schedules page, where you can see you job you just scheduled. And it will be running at the schedules time.
You can check the logs of you job in pentaho.log file in pentaho-server/tomcat/logs directory:
```
tail -1000f /Users/kv/pentaho-server/tomcat/logs/pentaho.log
```

score 0 · Answer 3 · answered Feb 29 '16 at 19:48

0

You can deploy the .kjb file as a Kettle endpoint as part of a Sparkl plugin and then call it with a simple API request. This should help:

http://fcorti.com/pentaho-sparkl/kettle-endpoint-sparkl-pentaho/

There are probably other ways to do this but that's the one I'm most familiar with. As for scheduling, you could just schedule a cronjob that makes the API request?

answered Feb 29 '16 at 19:48

pedrogfp

549
1
6
13

Can you provide a little code with your answer rather than simply linking to an external resource? – Suever Feb 29 '16 at 19:54
you don't really need to write _any_ code. Just create a sparkl plugin called, say 'JobRunner', and create a kettle endpoint of the 'job' type (named myJob, for instance). The plugin will automatically create a myJob.kjb file at _/pentaho-solutions/system/JobRunner/endpoints/kettle_. You just replace it with the job you want and execute it with a simple HTTP request to _http://:/pentaho/plugin/JobRunner/api/myJob_. this is probably not the ideal solution (it doesn't use the internal job scheduler, for instance) but it works – pedrogfp Feb 29 '16 at 20:34
Would it be best to generate the KJB using this method then modify the file via Spoon instead of replacing it? Looks like there are some custom properties that must be retained. – adelphospro Feb 29 '16 at 22:21
I don't think there's anything important in the default file other than the name of the output step (which is irrelevant unless you intend to use the job as a datasource), but if you want to play it safe, sure – pedrogfp Mar 01 '16 at 19:27

How to deploy scheduled Kettle jobs on Pentaho BI server v6 CE

3 Answers3

Linked