I have table_a and table_b. In table_a is a list of new SKUs with shipping weight and dimensions (separate fields for length, width, height; see table_a below). In table_b is a list of existing SKUs with the same columns as table_a, plus estimated shipping cost (see table_b below).
For each SKU in table_a, I need to find the SKU in table_b whose numeric value is closest to the same column's value in table_a.
For example, if in table_a I have the following records
item_id | item_wght | item_length | item_height | item_width |
---|---|---|---|---|
2587329866 | 9 | 15 | 14 | 10 |
2587829450 | 7 | 12 | 17 | 11 |
and in table_b I have the following records
item_id | item_wght | item_length | item_height | item_width | item_ship_est |
---|---|---|---|---|---|
2587837947 | 11 | 16 | 16 | 13 | 7.36 |
2587854109 | 8 | 14 | 13 | 11 | 6.83 |
I want to select the record in table_b where a particular column's value is the closest numerically to that in table_a.
So, say I want to select the record from table_b where item_wght is the closest in numeric value to item_wght for each record in table_a. Given the data above, the query should return row 2 (item_id 2587854109) of table_b for both records in table_a, because that record's item_wght is closest to the item_wght of both records in table_a.
The end goal is to end up with all the records in table_a, with only the corresponding item_ship_est record in table_b, as follows:
item_id | item_wght | item_length | item_height | item_width | item_ship_est |
---|---|---|---|---|---|
2587329866 | 9 | 15 | 14 | 10 | 6.83 |
2587829450 | 7 | 12 | 17 | 11 | 6.83 |
I'm not sure how to start on this problem, so I don't really have any useful queries to show what I've already tried. I've searched for similar questions on here, some of which seemed promising, but I was unable to achieve my desired result from any of the existing questions I found.
I'd appreciate any help I can get.
Update: After Mikhail kindly submitted an answer to the above, it made me realize I forgot to mention that table_b contains millions of rows, and table_a contains from hundreds to tens of thousands of rows, depending on the day, so a cross join on the entire data set isn't really feasible here.
I don't know if there if a way to perhaps parse down the number of records from table_b being cross joined to each record of table_a, by segmenting the tables by total dim volume or something. It's a tough one.