I want to join two tables with a common column in Big query?

Question

To join the tables, I am using the following query.

SELECT *
FROM(select user as uservalue1 FROM [projectname.FullData_Edited]) as FullData_Edited 
JOIN (select user as uservalue2 FROM [projectname.InstallDate]) as InstallDate 
ON FullData_Edited.uservalue1=InstallDate.uservalue2;

The query works but the joined table only has two columns uservalue1 and uservalue2. I want to keep all the columns present in both the table. Any idea how to achieve that?

Mikhail Berlyant · Accepted Answer · 2018-01-09T20:01:27.773

#legacySQL
SELECT <list of fields to output>
FROM [projectname:datasetname.FullData_Edited] AS FullData_Edited
JOIN [projectname:datasetname.InstallDate] AS InstallDate
ON FullData_Edited.user = InstallDate.user

or (and preferable)

#standardSQL
SELECT <list of fields to output>
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
ON FullData_Edited.user = InstallDate.user

Note, using SELECT * in such cases lead to Ambiguous column name error, so it is better to put explicit list of columns/fields you need to have in your output

The way around it is in using USING() syntax as in example below.
Assuming that user is the ONLY ambiguous field - it does the trick

#standardSQL
SELECT *
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
USING (user)

For example:

#standardSQL
WITH `projectname.datasetname.FullData_Edited` AS (
  SELECT 1 user, 'a' field1
),
`projectname.datasetname.InstallDate` AS (
  SELECT 1 user, 'b' field2
)
SELECT *
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
USING (user)

returns

user    field1  field2   
1       a       b

whereas using ON FullData_Edited.user = InstallDate.user gives below error

Error: Duplicate column names in the result are not supported. Found duplicate(s): user

And what about if it is not the only ambiguous filed? – amarchin Jan 30 '19 at 14:05 — amarchin, Jan 30 '19 at 14:05

score 1 · Answer 2 · answered Jan 09 '18 at 19:44

1

Don't use subqueries if you want all columns:

SELECT *
FROM [projectname.FullData_Edited] as FullData_Edited JOIN
     [projectname.InstallDate] as InstallDate 
     ON FullData_Edited.uservalue1 = InstallDate.uservalue2;

You may have to list out the particular columns you want to avoid duplicate column names.

While you are at it, you should also switch to standard SQL.

answered Jan 09 '18 at 19:44

Gordon Linoff

1,242,037
58
646
786

@VSR . . . Note the part of the answer that reads: "You may have to list out the particular columns you want to avoid duplicate column names." – Gordon Linoff Jan 10 '18 at 02:01

I want to join two tables with a common column in Big query?

2 Answers2