0

My source data has the same data as the reference record, but in a different order. eg: 0.42345795,test address client #12 order; token@,token@ client #12 order; address, For the same inout and lookup records, SSIS gave a similarity of 0.4 and python's confidence value was 89. Is there any way to make SSIS Fuzzy lookup transformation to ignore the token order so that the similarity value would increase?

Mythri
  • 1
  • 1
  • Can you provide a [mcve] for your problem? – A. Kootstra Jun 23 '17 at 19:11
  • Based on the information in the question I'm unable to understand, let alone reproduce your problem. Please provide a simple example of the Excel sheet, the data and formula used. Also provide the Python code you used for the comparison. – A. Kootstra Jun 23 '17 at 19:24
  • Source address: test address client #12 order; token@, Lookup address: token@ client #12 order; address test. For this pair, I ran fuzzy lookup in both SSIS and python. SSIS result: SIMILARITY- 0.33312657, CONFIDENCE - 0.98750001, Source address - test address client #12 order; token@, Lookup Address - token@ client #12 order; address test. PYTHON token_sort_ratio Result: CONFIDENCE - 89, for the same input. Is there a parameter that can make SSIS ignore the token order so that the Similarity would go up? – Mythri Jun 23 '17 at 19:36
  • python code:import pyodbc from fuzzywuzzy import fuzz from fuzzywuzzy import process connection = pyodbc.connect('DRIVER={SQL Server};SERVER=server;DATABASE=database') cursor = connection.cursor() cursor.execute("SELECT * address FROM CLIENT") print (row1) cursor.execute("SELECT * lookup_address FROM LOOKUP") row2 = cursor.fetchall() print (row2) print ("\nConfidence") print (fuzz.token_sort_ratio(row1,row2)) – Mythri Jun 23 '17 at 19:41

0 Answers0