0

I have to design ETL to load data into a table. But, I shouldnt load values matching keys in a lookup table. Eg.

*Input Table*
Cab Ride
Ride in Cab
Booked Cab
Self drive
Car pooling

*Lookup Table*
Cab
Taxi


*Destination Table*
Self drive
Car pooling

As we can see in destination table we are not loading data that contains Cab

Please let me know if this possible in Pentaho or SQL

Arjun
  • 1,049
  • 6
  • 23
  • 37
  • 1
    So the requirement is "load the row if it does NOT contain any string found in the lookup table", correct? This could be done with the `Regex Evaluation` step an a parent Job, but it won't be very fast. It's probably better to do it in JavaScript with the `Modified Java Script Value` step. – Brian.D.Myers Mar 09 '16 at 20:01
  • @Brian.D.Myers that is correct, load if not contains a match in lookup table. I will try with Regex Evaluator. – Arjun Mar 10 '16 at 05:43

1 Answers1

1

The simplest way is building a regex

  • Your lookup table feeds the various strings that you want to filter out (or in);
  • Using a group by step you concatenate all those strings separated by |; the result is "Cab|Taxi".
  • Prepend .( and append ). with a calculator step, ending up with .(Cab|Taxi).; call this field "regex_filter"
  • Cross join this single row with the main data stream;
  • Now you can use a filter rows with condition "NOT input_field REGEXP regex_filter". (you may want to prepend (?i) to the the regex to make it case insensitive).

See attached example: Regex filter in PDI 5.4

nsousa
  • 4,448
  • 1
  • 10
  • 15
  • thank you for the solution. I will try this. But my look table has lot may values. almost 10K records. – Arjun Mar 10 '16 at 12:45