0

Consider a scenario where I have the data about same entity from two different sources. As an example, the camera Nikon D3200, Nikon mentions the dimensions as 5.0 in. (125 mm) x 3.8 in. (96 mm) x 3.1 in. (76.5 mm) where as on amazon website its 3.1 x 3.8 x 5 inches. Now if I want to combine the data from both sources in a single table, how can I go about it? How can I confirm that the these two dimensions are actually for the same product? Any help with the strategies for record linkage will be appreciated. T

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
trailblazer
  • 1,421
  • 5
  • 20
  • 43
  • Sort the dimension in descending order before comparing. You're going to have to convert metric to English (or English to metric) and get the dimension numbers from several different string formats. – Gilbert Le Blanc May 09 '14 at 18:41

1 Answers1

0

What I would do is create 2 columns in a new table for each source. For example:

productID_s1  | productID_s2  |   name                     | dimensions_source1                                     | dimensions_source2
CAM110          CAM101            Nikon Camera Model 82      5.0 in. (125 mm) x 3.8 in. (96 mm) x 3.1 in. (76.5 mm)   3.1 x 3.8 x 5 inches

With Query:

CREATE TABLE Destination(
productID INT IDENTITY PRIMARY KEY,
productID_s1 NVARCHAR(40),
productID_s2 NVARCHAR(40),
name NVARCHAR(40),
dimensions_source1 NVARCHAR(255),
dimensions_source2 NVARCHAR(255)
)

SELECT s1.productID, s1.name, s1.dimensions
FROM Source1 s1

UPDATE Destination
SET productID_s2 = productID_fromsource2
FROM Source2 WHERE name_fromsource2 LIKE name

UPDATE Destination
SET dimensions_source2 = dimensions_fromsource2
FROM Source2 WHERE name_fromsource2 LIKE name

This way you know it is the same product...I don't know if this simple answer is in any way helpful.