-1

I'm trying to left join multiple large tables into one large table using Google BigQuery so that it will be processed more easily in Tableau.

I've ordered the files by size (131 MB to 20.2MB and one 3.96KB) in the query and changed my left join from 'LEFT JOIN' to 'LEFT JOIN EACH'

SELECT * FROM 
[Table 1 location] A
LEFT JOIN EACH [Table 2 Location] B
ON A.SUBNUM = B.SUBNUM 
AND A.VISITSEQ = B.VISITSEQ
LEFT JOIN EACH [Table 3 Location] C 
ON A.SUBNUM = C.SUBNUM 
AND A.VISITSEQ = C.VISITSEQ
LEFT JOIN EACH [Table 4 Location] D
ON A.SUBNUM = D.SUBNUM 
AND A.VISITSEQ = D.VISITSEQ
LEFT JOIN EACH [Table 5 Location] E 
ON A.SUBNUM = E.SUBNUM 
AND A.VISITSEQ = E.VISITSEQ
LEFT JOIN EACH [Table 6 Location] F
ON A.SUBNUM = F.SUBNUM
LEFT JOIN EACH[Table 7 Location] G 
ON A.SUBNUM = G.SUBNUM 
AND A.VISITSEQ = G.VISITSEQ
LEFT JOIN EACH[Table 8 Location] H 
ON A.SUBNUM = H.SUBNUM 
AND A.VISITSEQ = H.VISITSEQ
LEFT JOIN EACH [Table 9 Location] I 
ON A.SUBNUM = I.SUBNUM 
AND A.VISITSEQ = I.VISITSEQ
LEFT JOIN EACH[Table 10 Location] J
ON A.SUBNUM = J.SUBNUM 
AND A.VISITSEQ = J.VISITSEQ
LEFT JOIN EACH[Table 11 Location] K 
ON A.SITENUM = K.SITENUM

I'm expecting this to join all of the tables into one massive table but instead I receive the error "Resources exceeded during query execution. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors (error code: resourcesExceeded) "

Hatim
  • 1
  • 1
  • You don't clearly say what you want the output to be as a function of input. Are you sure you don't want a UNION? Use enough words, sentences & references to parts of examples to clearly & fully say what you mean. To describe a result: Say enough that someone could go away & come back with a solution. When giving a (base or query result) relation(ship)/association or table, say what a row says about the business situation in terms of its column values when it is in the table. – philipxy Jun 14 '19 at 16:47

3 Answers3

1

Google Cloud support here!

The error you're getting says that your query is using too many resources and can't be processed properly. I'd suggest that you split it into multiple, simpler subqueries. Instead of joining 11 large tables at a time, try joining 4, 4, and 3, for example. Also, the use of EACH is no longer recommended in BigQuery, since it's part of legacy SQL, and BigQuery uses Standard SQL. Try removing it from your query.

I hope that helps.

Hyperion
  • 156
  • 11
0

This is too long for a comment.

First, write the query for standard SQL. This is the version of SQL you should be using.

Second, I suspect your JOIN conditions are not correct. So, you need to debug your query.

Build your query one step at a time, by running:

SELECT COUNT(*)
FROM [Table 1 location] A LEFT JOIN
     [Table 2 Location] B
     ON A.SUBNUM = B.SUBNUM 

Check if the count is reasonable. If so, add the next JOIN. If not, figure out why the right JOIN keys are.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks Gordon, what do you mean by if the count is reasonable? – Hatim Jun 10 '19 at 16:30
  • @Hatim . . . You should have a sense of how many rows should match each join condition. – Gordon Linoff Jun 10 '19 at 16:38
  • Thank you so much @Gordon, I took a clue; rewrote my query for standard SQL and looked at the counts. I noticed that each join was multiplying the rows exponentially. (Each table only has ~700K rows and I was reaching into billions). I'm happy with the left join and having the nulls, but it looks like I'm missing a key concept here. I also realized that the db I'm extracting the data from creates a new record everyday, so I've limited the data to just a specific day. – Hatim Jun 11 '19 at 00:53
  • Although I do need all the columns for this master table, from my research it seems that I need to specify each column name and alias to avoid the error 'Duplicate column names in the result are not supported'. Will update once resolved. – Hatim Jun 11 '19 at 16:58
0

Being this was my first time using Google BQ I was not aware of the distinction between Legacy and Standard SQL. After a bit more research and some guidance from Gordon Linoff and Alejandro I was able to get my query to produce the results I needed.

I selected 504 columns and aliased them with SELECT ______ AS ________, joined each table on a few more columns, and then finished it with a WHERE clause which limited my data to a specific date in which the data was abstracted from the database.

Thank you everyone for your help and support!

Hatim
  • 1
  • 1