I am using Talend to check quality of data where I compare the names of the person of two databases. One database will have correct names and another database will have corrupted names. What I have to do is compare both names and find correct names from corrupted names.
I am using the tFuzzyMatch component to match the names.
The database which has the correct names has 212000 records.
The database which has the incorrect names has 50000 records.
tFuzzyMatch takes a lot of time to lookup correct names for each corrupted name.
Can anyone help me to optimize tFuzzyMatch to reduce execution time?
My job looks like this:
Please take a look at fuzzy match lookup. It has 3124340 rows.
I would like to speed up Fuzzy Match lookup.