I have several tables in a SQL Server database, two of them (Table1
and Table2
) I would like to select a specific subset from, to fill in a third table (Table3
).
In Table1
there are 25 columns, from which I am only interested in three, let's call them Col1
, Col2
and Col3
. All three are not unique in this table, but I would like to extract the unique pairs, as follows:
Col1
+Col2
= Unique Key forTable3
.Col3
+Col2
= Optional, foreign key intoTable2
.
To extract the unique keys for Table3
from Table1
the following SQL works fine:
SELECT Col1, Col2
FROM Table1
GROUP BY Col1, Col2
However this is missing Col3
. The first problem is that Col3
can't simply be added as part of the GROUP BY
as there can be different values for it, which causes duplicate combinations of Col1
+ Col2
to be returned.
This is where Table2
comes into play; Col3
+ Col2
form a unique key into Table2
, but not every combination is present (which is helpful), as a JOIN
can be used to filter away the invalid combinations:
SELECT a.Col1, a.Col2, a.Col3
FROM Table1 a
JOIN Table2 b ON b.Col3 = a.Col3 AND b.Col2 = a.Col2
GROUP BY a.Col1, a.Col2, a.Col3
Now my final problem, unfortunately there are some (very few) combinations that do result in duplicate Col1
+ Col2
keys for Table3
.
If we assume it is okay to lose some Col3
values, how can I write a SELECT
to extract the three columns, ensuring that the combination Col1
+ Col2
is unique? And if possible keeping a Col3
value that provides a valid key combination in Table2
.
I've messed about with adding TOP 1
but I've failed in getting anything to work to my liking...
EDIT: Example data as requested.
Table1
| Col1 | Col2 | Col3 |
| 100 | 00 | 010 |
| 100 | 10 | 020 |
| 200 | 00 | 030 |
| 300 | 00 | 040 |
| 300 | 00 | 040 |
| 400 | 10 | 050 |
| 400 | 10 | 060 |
| 400 | 10 | 070 |
Table2
| Colx | Col2 | Col3 |
| car | 00 | 010 |
| cat | 10 | 030 |
| dog | 00 | 040 |
| bee | 10 | 040 |
| eye | 10 | 060 |
| bit | 10 | 070 |
Table3
| Col1 | Col2 | Col3 |
| 100 | 00 | 010 |
| 100 | 10 | 020 |
| 200 | 00 | 030 |
| 300 | 00 | 040 |
| 400 | 00 | 060 |
The third table shows the result I am looking for - the table only contains unique combinations of Col1
+ Col2
and also contains a Col3
values, preferably one that provides a value combination with Col2
in the second table (ie. the last entry, 400, 00, 060).
I hope this provides a little more clarity.