Fuzzy matching Informatica vs SQL

Question

We are currently debating whether to implement pairwise matching functions in SQL to perform fuzzy matching on invoice reference numbers, or go down the route of using Informatica.

Informatica is a great solution (so ive heard) however im not familiar with the software.

Has anybody got any experience of its fuzzy match capabilities and the advantages it may offer over building some logic in SQL.

Thanks

No Zane, it is our preference to run custom matching algorithms on the basic tables within SQL. SSIS will not be used in the system flow. — user3933946, Aug 21 '14 at 15:27

score 0 · Answer 1 · answered Sep 22 '14 at 15:47

Parser transformation can be used in Informatica do the job. Reference Data objects can be created in Informatica which will be used to search your given string. The reference data objects are of the following types - Pattern Sets , Probabilistic Models, Reference Tables , Regex , Token sets. Pattern Sets - A pattern set contains the logic to identify data patterns for eg separating out initials from the name. Probabilistic Models - A probabilistic model identifies tokens by the types of information they contain and by their positions in an input string. A probabilistic model contains the following columns: An input column that represents the data on the input port. You populate the column with sample data from the input port. The model uses the sample data as reference data in parsing and labeling operations. One or more label columns that identify the types of information in each input string. You add the columns to the model, and you assign labels to the tokens in each string. Use the label columns to indicate the correct position of the tokens in the string. When you use a probabilistic model in a Parser transformation, the Parser writes each input value to an output port based on the label that matches the value. For example, the Parser writes the string "Franklin Delano Roosevelt" to FIRSTNAME, MIDDLENAME, and LASTNAME output ports. The Parser transformation can infer a match between the input port data values and the model data values even if the port data is not listed in the model. This means that a probabilistic model does not need to list every token in a data set to correctly label or parse the tokens in the data set. The transformation uses probabilistic or fuzzy logic to identify tokens that match tokens in the probabilistic model. You update the fuzzy logic rules when you compile the probabilistic model. Reference Table - This is a db table for searching

score 0 · Answer 2 · answered Oct 02 '14 at 05:45

Here it seems that your data is unstructured and you want to extract meaningful data from it. Informatica DataTransformation(DT) tool is good if your data follows some pattern. It is used with UDT transformation inside Informatica PowerCenter. With DT you can create a parser to parse your data and using serializer you can write it to any form you want, later you can do aggregation and other transformations on that data using Informatica PowerCenter's ETL capabilities. DT is well known for it's capabilities to parse PDF's, forms and invoices. I hope it can solve the purpose.

Fuzzy matching Informatica vs SQL

2 Answers2