4

I'm trying to split a table into multiple tables based on the value of a given column using Talend Open Studio. Let's say this column can contain any of the integer values of 1, 2, 3, etc. then according to this value, these rows should go to table_1, table_2, table_3 etc.

It would be best if I could solve this when the number of different values in that column is not known in advance, but for now we can assume that all these output tables exists already. The bottom line is that the number of different values and therefore the number of different tables are high enough that setting up the individual filters manually is not an option.

Is this possible to solve this using Talend Open Studio or any similiary open source ETL tools like Pentaho Keetle?

Of course, I could just write a simple script myself, but I would prefer to use a proper ETL tool since the complete ETL process is quite complex.

3 Answers3

2

In PDI or Pentaho Kettle you could do this with partitioning. (A right click option on the step IIRC) Partitioning in PDI is designed for exactly this sort of problem.

Codek
  • 5,114
  • 3
  • 24
  • 38
  • 1
    Thanks, this worked out. There is a an option called 'Is the name of the table defined in a field?' for 'Table output'. – user1765796 Oct 24 '12 at 14:08
1

Yes that's Possible to do and split the data on the basis of single column to different table, but for that you need to create table dynamically :-

tFileInputDelimited->tFlowtoIterate ->tFixedFlowInput->and the can use globalMap() to get the column values and use the same to seperate the data to different tables. -> And the can use globalMap(Columnused to seperate data) in table name.

enter image description here

0

The first solution that came to my mind was using the replicator to transport the current row to three filters which act as guard and only let rows through with either 1 2 or 3 in the given column. pic: https://i.stack.imgur.com/Rgjrh.png

But you could also build the table name dynamically, if that is what you want, pic: https://i.stack.imgur.com/uiTST.png

rretzbach
  • 744
  • 1
  • 6
  • 16
  • Thanks, that was my original idea as well. But the problem that this is only applicable for a small number of values/tables, and in my case there could be up to thousands of tables so setting up all these filters individually is not doable. – user1765796 Oct 23 '12 at 09:59