0

I'm afraid that if a bunch of folks start running my actual code I'll be billed for the queries so my example code is for a fake database.

I've successfully established my connection to BigQuery:

con <- dbConnect(
  bigrquery::bigquery(),
  project = 'myproject',
  dataset = 'dataset',
  billing = 'myproject'
)

Then performed a LEFT JOIN using the coalesce function:

    dbGetQuery(con,
"SELECT 
  `myproject.dataset.table_1x`.Pokemon,
  coalesce(`myproject.dataset.table_1`.Type_1,`myproject.dataset.table_2`.Type_1) AS Type_1,
  coalesce(`myproject.dataset.table_1`.Type_2,`myproject.dataset.table_2`.Type_2) AS Type_2,
  `myproject.dataset.table_1`.Total,
  `myproject.dataset.table_1`.HP,
  `myproject.dataset.table_1`.Attack,
  `myproject.dataset.table_1`.Special_Attack,
  `myproject.dataset.table_1`.Defense,
  `myproject.dataset.table_1`.Special_Defense,
  `myproject.dataset.table_1`.Speed,
FROM `myproject.dataset.table_1`
LEFT JOIN `myproject.dataset.table_2`
  ON `myproject.dataset.table_1`.Pokemon = `myproject.dataset.table_2`.Pokemon
ORDER BY `myproject.dataset.table_1`.ID;")

The JOIN produced the table I intended and now I'd like to query that table but like...where is it? How do I connect? Can I save it locally so that I can start working my analysis in R? Even if I go to BigQuery, select the Project History tab, select the query I just ran in RStudio, and copy the Job ID for the temporary table, I still get the following error:

Error: Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Run `rlang::last_error()` to see where the error occurred.

And if I follow up:

Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
 1. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
 2. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
 3. DBI:::.local(conn, statement, ...)
 5. bigrquery::dbSendQuery(conn, statement, ...)
 6. bigrquery:::BigQueryResult(conn, statement, ...)
 7. bigrquery::bq_job_wait(job, quiet = conn@quiet)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
    x
 1. +-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
 2. \-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
 3.   \-DBI:::.local(conn, statement, ...)
 4.     +-DBI::dbSendQuery(conn, statement, ...)
 5.     \-bigrquery::dbSendQuery(conn, statement, ...)
 6.       \-bigrquery:::BigQueryResult(conn, statement, ...)
 7.         \-bigrquery::bq_job_wait(job, quiet = conn@quiet)

Can someone please explain? Is it just that I can't query a temporary table with the bigrquery package?

1 Answers1

0

From looking at the documentation here and here, the problem might just be that you did not assign the results anywhere.

local_df = dbGetQuery(...

should take the results from your database query and copy them into local R memory. Take care as there is no check for the size of the results, so it is easy to run out of memory in when doing this.


You have tagged the question with dbplyr, but it looks like you are just using the DBI package. If you want to be writing R and have it translated to SQL, then you can do this using dbplyr. It would look something like this:

con <- dbConnect(...) # your connection details here

remote_tbl1 = tbl(con, from = "table_1")
remote_tbl2 = tbl(con, from = "table_2")

new_remote_tbl = remote_tbl1 %>%
  left_join(remote_tbl2, by = "Pokemon", suffix = c("",".y")) %>%
  mutate(Type_1 = coalesce(Type_1, Type_1.y),
         Type_2 = coalesce(Type_2, Type_2.y)) %>%
  select(ID, Pokemon, Type_1, Type_2, ...) %>% # list your return columns
  arrange(ID)

When you use this approach, new_remote_tbl can be thought of as a new table in the database which you can query and manipulate further. (It is not actually a table - no data was saved to disc - but you can query it and interact with it as if it were and the database will produce it for you on demand).

There are some limitations of working with a remote table (the biggest is you are limited to commands that dbplyr can translate into SQL). When you want to copy the current remote table into local R memory, use collect:

local_df = remote_df %>%
  collect()
Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
  • Hi! Thanks so much for this documentation and for pointing me toward the local df solution. Concerning dbplyr/DBI - I actually do not want to be writing R and having it translated to SQL. The reasons are twofold: first that I'm not terribly proficient in R outside the ggplot package, and second I'm creating a portfolio project in an attempt to land an analyst-level job, so I really want to share SQL. Am I perhaps using the wrong tools? I thought writing SQL in an RMD file would be the easiest way to share SQL on Github. Should I just learn to write SQL in R? Will employers have a preference? – watermelon_diet Feb 09 '22 at 12:24
  • If you want to execute SQL via R, then the DBI package (`dbGetQuery`, `dbExecute` etc.) are a good way to go. You don't need dbplyr for this. I assumed you were using it because of how the question was tagged. – Simon.S.A. Feb 09 '22 at 19:49
  • Regarding demonstrating SQL for a portfolio - I am not a recruiter, so I can't tell you what will be more effective. If I was trying to do something like this, I would save separate SQL files and R files. The R code would read the SQL files instead of containing the SQL code. This would makes the different languages you are using clearer (because readers will be able to tell from the file extension). – Simon.S.A. Feb 09 '22 at 19:55