0

I have a table that has user a user_id and a new record for each return reason for that user. As show here:

| user_id   | return_reason |
|---------  |-------------- |
| 1         | broken        |
| 2         | changed mind  |
| 2         | overpriced    |
| 3         | changed mind  |
| 4         | changed mind  |

What I would like to do is generate a foreign key for each combination of values that are applicable in a new table and apply that key to the user_id in a new table. Effectively creating a many to many relationship. The result would look like so:

Dimension Table ->

| reason_id     | return_reason     |
|-----------    |---------------    |
| 1             | broken            |
| 2             | changed mind      |
| 2             | overpriced        |
| 3             | changed mind      |

Fact Table ->

| user_id   | reason_id     |
|---------  |-----------    |
| 1         | 1             |
| 2         | 2             |
| 3         | 3             |
| 4         | 3             |

My thought process is to iterate through the table with a cursor, but this seems like a standard problem and therefore has a more efficient way of doing this. Is there a specific name for this type of problem? I also thought about pivoting and unpivoting. But that didn't seem too clean either. Any help or reference to articles in how to process this is appreciated.

Anthony
  • 1
  • 1
  • Why is there one row less in the resultset that in the original data? – GMB Sep 26 '20 at 17:59
  • The result set would have a mapping of attributes to users. Therefore the table should have one record per user. The `reason_id` is a foreign key in that table. In another table I will have a dimension table that expands on each foreign key. So for example In the dimension table, `reason_id` = 2 , would have 2 rows one for 'changed mind' and one for 'overpriced'. But they would share the same `reason_id`. Thus capturing an instance of these two reasons together – Anthony Sep 26 '20 at 18:04
  • You desired results shows just one table. If you want to generate two tables (or more?), then please show us these tables too. – GMB Sep 26 '20 at 18:07
  • Good point, I updated the post. Thank you. – Anthony Sep 26 '20 at 18:15
  • Huh? You have repeated ids in the `reason` table. I just don't follow the logic. And why doesn't the "fact" table have five rows like the original data? – Gordon Linoff Sep 27 '20 at 00:58

1 Answers1

0

The problem concerns data normalization and relational integrity. Your concept doesn't really make sense - Dimension table shows two different reasons with same ID and Fact table loses a record. Conventional schema for this many-to-many relationship would be three tables like:

  • Users table (info about users and UserID is unique)
  • Reasons table (info about reasons and ReasonID is unique)
  • UserReasons junction table (associates users with reasons - your existing table). Assuming user could associate with same reason multiple times, probably also need ReturnDate and OrderID_FK fields in UserReasons.

So, need to replace reason description in first table (UserReasons) with a ReasonID. Add a number long integer field ReasonID_FK in that table to hold ReasonID key.

To build Reasons table based on current data, use DISTINCT:
SELECT DISTINCT return_reason INTO Reasons FROM UserReasons

In new table, rename return_reason field to ReasonDescription and add an autonumber field ReasonID.

Now run UPDATE action to populate ReasonID_FK field in UserReasons.
UPDATE UserReasons INNER JOIN UserReasons.return_reason ON Reasons.ReasonDescription SET UserReasons.ReasonID_FK = Reasons.ReasonID

When all looks good, delete return_reason field.

June7
  • 19,874
  • 8
  • 24
  • 34