0

I am using Talend to check quality of data where I compare the names of the person of two databases. One database will have correct names and another database will have corrupted names. What I have to do is compare both names and find correct names from corrupted names.

I am using the tFuzzyMatch component to match the names.

The database which has the correct names has 212000 records.

The database which has the incorrect names has 50000 records.

tFuzzyMatch takes a lot of time to lookup correct names for each corrupted name.

Can anyone help me to optimize tFuzzyMatch to reduce execution time?

My job looks like this:

enter image description here

Please take a look at fuzzy match lookup. It has 3124340 rows.

I would like to speed up Fuzzy Match lookup.

Prakki
  • 149
  • 1
  • 3
  • 13
  • Can you post screenshots of your job layout and any pertinent components such as the tFuzzyMatch and any other key components in your job? It's pretty hard to see what to optimise without seeing what you've done. – ydaetskcoR Sep 09 '14 at 08:01
  • I've edited your job layout into the question. Could you also include a screenshot of the configuration of the tFuzzyMatch? It would also be useful to know roughly what speed you're getting out of the job (e.g. rows/second) and what you are aiming for. – ydaetskcoR Sep 11 '14 at 11:35

0 Answers0