1

I'm looking for a tool or process for exporting facebook insights data for a facebook page and a facebook app. Currently I am just manually downloading csv files from their Insights interface but ideally I want to automate this process and load the data into Pentaho Kettle, so I can perform some operations on the data.

Is there some way to automate the downloading and input of csv files? Or will I have to use the facebook graph api explorer? I am currently looking at a set-up where I use NetBeans and RestFB to pull the data I want, and then access that data using Pentaho Kettle. I am not sure if this will work, or if it is the best approach.

  • Not sure exactly how the interface works, but i'd have thought Pentaho Kettle could get it directly. If it can't you could always write a plugin, which will probably end up using RestFB? There's lots of good doco on writing plugins, and if you contribute back to the community you'll probably get lots of help too. – Codek Feb 19 '13 at 16:25

1 Answers1

0

As Codek says, a Kettle plugin is a very good idea, and would be very helpful to the Kettle project. However, it's also a serious effort.

If you don't want to put in that kind of effort, you can certainly download files with a Kettle Job as long as the files are available through a standard transfer method (FTP, SFTP, SSH, etc). I've never used RestFB, so I don't know what's available. You might be able to get directly from a web service with the REST Client transform step.

After downloading the files, you can send them to a transform to be loaded. You can do this with either the Execute for every input row? option on the Transformation job step, or you can get the filenames from the job's result set in the transform with Get files from result.

Then you can archive the files after loading with Copy or Move result filenames. In one job, I find only files that are not in my archive using a Get File Names and Merge Join, and then a Set files in result step in a transform, so that can be done if need too.

To automate it, you can run your job from a scheduler using Kitchen.bat/Kitchen.sh. Since I use PostgreSQL a lot, I use PGAgent as my scheduler, but the Windows scheduler or cron work too.

Hope that helps.

Brian.D.Myers
  • 2,448
  • 2
  • 20
  • 17
  • Writing a kettle plugin is a piece of cake and there are lots of examples. Given the original poster is already talking about coding this particular piece anyway im sure they could handle it! – Codek Feb 24 '13 at 18:53
  • I'm sure they can too. I just meant that a plugin is more work than a transform/job. I agree though that having a step would be a great addition to Kettle. – Brian.D.Myers Feb 27 '13 at 23:43