1

I am new to Pentaho, and would like to automate my processes as much as possible.

This is what my workflow looks like: Files will be deposited in s3 at random times by upstream. These files would need to be picked up or read and processed by Pentato and then deposited back in s3 to be consumed by downstream.

Pentaho PDI is running in AWS on ec2 instances.

What's the best way to approach this sort of workflow?

I can see that Pentaho has inbuilt provision for s3 CSV input and output, but I'm not sure how to go about automating the process of reading these files once they've been deposited in s3, and trigerring a job or transformation.

Metro
  • 873
  • 8
  • 19

1 Answers1

1

You can create a job in Pentaho to check if files exist.

If there are files, you launch the processing for each file, if there are no files, you end the job.

You program this job to be executed every five minutes, hour or another convenient interval.

Ana GH
  • 1,397
  • 1
  • 9
  • 19