SQL: Remove duplicate columns when joining tables with large number of columns

Question

Objective: I want to be able to remove the common/ duplicate columns, from the output of a JOIN between 2 tables.

Problem: Most of the other questions that I have checked here on this subject, talk about explicitly mentioning the column names after select

For Instance: SQL Server : removing duplicate column while joining tables

The tables that I am dealing with usually have 20+ columns each, and it is not optimal to manually write each column name separately.

So if TableA has A columns and TableB has B columns, with one common column between them, then the final number of non-suplicate columns from a JOIN would be A+B-1, where A, B > 20

Existing Query:

-- Following query includes the duplicate column in the result
SELECT *
FROM TABLEA A
INNER JOIN TABLEB B ON (A.ID = B.ID)
WHERE <SOME_CONDITION>

Other solutions that I have come across:

-- Following query omits the duplicate column in the result
SELECT A.ID, A.A1, A.A2, A.A3, A.A4, A.A5, A.A6, A.A7, A.A8, A.A9, A.A10,
    B.A1, B.B2, B.B3, B.B4, B.B5, B.B6, B.B7, B.B8, B.B9, B.B10
FROM TABLEA A
INNER JOIN TABLEB B ON (A.ID = B.ID)
WHERE <SOME_CONDITION>

Your best option here is to just list out the column names explictly. You would need dynamic SQL to implement verbatim what you have in mind. — Tim Biegeleisen, Nov 28 '19 at 13:15
@Strawberry: I felt the same prima facie. The DB architect that I spoke with says that these tables carry multiple columns to avoid more number of joins which in turn would increase the (query execution + data fetching) time for real time dashboards. — Arsenic_33, Nov 29 '19 at 08:45

score 3 · Accepted Answer · answered Nov 28 '19 at 13:14

3

If the only duplicate columns are the JOIN columns, you can fix this with USING:

SELECT *
FROM TABLEA A INNER JOIN
     TABLEB B 
     USING (id)
WHERE <SOME_CONDITION>;

answered Nov 28 '19 at 13:14

Gordon Linoff

1,242,037
58
646
786

In this case, column_name for common columns would have to be same? – Arsenic_33 Nov 29 '19 at 08:36
@Arsenic_33 . . . For this to work, the column names used for the `JOIN` need to be the same. – Gordon Linoff Nov 29 '19 at 18:22

score 0 · Answer 2 · answered Nov 28 '19 at 14:00

If you don't want to write down all column names then you need to use dynamic SQL to select distinct column names from table schema and then execute your query string.

Example:

SELECT GROUP_CONCAT(DISTINCT(COLUMN_NAME) SEPARATOR ', ')
INTO   @ditinct_columns
FROM   information_schema.columns
WHERE  table_name IN ('A', 'B');

SET @q = CONCAT("SELECT ", @ditinct_columns, " FROM A AS a INNER JOIN B AS b ON <SOME_CONDITION>");

SELECT @q;

/* execute*/
PREPARE stmt FROM @q;
EXECUTE stmt;
DEALLOCATE PREPARE stmt

Able to understand this logically. GROUP_CONCAT is having trouble with Vertica (via DBeaver out of box). For instance, I've replaced information_schema.columns with v_catalog.columns — Arsenic_33, Nov 29 '19 at 08:34

SQL: Remove duplicate columns when joining tables with large number of columns

2 Answers2

Linked

Related