0

So, I have some questions about PDI and my Transformation flow. Based on my trans flow I get error GC Overhead Limit Exceeded and I have already search for solution like increase the memory for spoon.bat -Xms -Xmx2g but it doesn't work. So I think for alternative solution and I think that maybe I should change the flow (Sorte Merge used much memory and 'cause that error). Or if you guys have another solution for my problem please , i really need that.

Bellow is my trans flow pic. Thanks #SorryForMyGrammar

My Transformation Flow

Rio Odestila
  • 125
  • 2
  • 19

2 Answers2

0

Are you sure you want a merge, and not a Dummy or an Append streams to append the rows?

The merge rows will in fact perform a join. If you have many hits where keys from one stream match keys from another (or, even worse, if you have no keys, which results in a cross join), you’ll be creating a nightmare in terms of cardinality.

To make matters worse, the merge join won’t be able to finish while there are input rows coming from either data stream out of the filter rows, keeping a huge data set in memory.

If a merge is in fact what you need you should carefully analyse the cardinality of the resultset but also add two Sort rows steps ahead of the Merge join, which must sort each stream by the join keys and also has the benefic of decoupling the data stream flow, allowing the merge join to run the whole stream in one go, without causing potential deadlocks in your transformation.

nsousa
  • 4,448
  • 1
  • 10
  • 15
  • Well, my transformation is suppose to do ETL. - Combine data from CSV and Database(PostgreSQL) using Stream Value Lookup - Do Concate Field to make a new Field - Use Java Filter to ETL (Condition for data) and then do calculate Data (its like use IF ELSE Statement in programming) - Use Sorted Merge to combine the data into one again. – Rio Odestila Jun 06 '18 at 01:29
  • My error occur when the data arrive in Sorted Merge. And the solutions I have done isn't work. I'll try it give the result. Based on your suggestion which is I have to add Sort Row steps ahead Merge Join, its mean I also need change my Sorted Merge into Merge Join. – Rio Odestila Jun 06 '18 at 01:43
  • Preview the data with a limited number of rows and see if that’s what you really want. I find it unlikely. Regardless of your goals, that transformation won’t scale. – nsousa Jun 07 '18 at 07:50
0

You can skip the whole split and merge operations by including that logic in the Formula step.

Use IF(condition;A;B), where condition is the test you defined in the filter rows step and A and B are the existing calculations from the respective formula steps. That way each row gets the right calculation and the stream never needs to be joined.

Cyrus
  • 2,135
  • 2
  • 11
  • 14