3

We have a working prototype in Google Cloud DataLab where we have made extensive use of the %%bq 'magics' to build up a parameterized query using various UDFs.

So something like:

customFn1 = bq.UDF("customFn1",theJsCode1,theDataTypeDef1,params=paramDefs1)
customFn2 = bq.UDF("customFn2",theJsCode2,theDataTypeDef2,params=paramDefs2)

then:

%%bq query --name finalQry --udfs customFn1 customFn2
SELECT 
     customFn1(bell, whistle) AS Output1,
     customFn2(rank,serialNo) AS Output2
FROM
     myImaginaryTable
WHERE
    id < @param1 AND id > @param2

then:

%%bq sample -q finalQry 
parameters:
   - name: param1
     type: STRING
     value: "ab1"
   - name: param2
     type: STRING
     value: "ab3"

We'd like to start working towards deploying this prototype and need to access the final query text to use in our production code (which will, in due course, be called through the API).

I know that GC datalab is doing some work behind the scenes to compose this query (our real world example is much more complex) which I need to replicate in our code.

it would be really useful if I could get access the actual query string GC datalab generates after it has composed the udfs, replaced the parameters, etc. ? I could do it manually with a shedload of copying and pasting, etc. but it would be super-useful if I could automatically access this final query string somewhere?

I have had a read at the docs here: http://googledatalab.github.io/pydatalab/datalab.bigquery.html but cannot seem to find what I need. Can anyone help?

Stewart_R
  • 13,764
  • 11
  • 60
  • 106

1 Answers1

3

Have you tried finalQry.sql? This will give you the query's SQL string. However, this will be before any parameter substitution, since this happens at execution time (your sample call in this case).

See http://googledatalab.github.io/pydatalab/datalab.bigquery.html#datalab.bigquery.Query.sql

If you think getting the SQL string after substitution is a useful feature, you can open a feature request on the Github repo.

yelsayed
  • 5,236
  • 3
  • 27
  • 38
  • this is what i was missing (not sure how I missed it in the docs - my bad). Not terribly important to get the query after parameter substitution. After composition of udfs, etc gets me 99% there. Thank you – Stewart_R Sep 19 '17 at 11:17
  • @yelsayed is the documentation that you linked to the active/up-to-date documentation for Cloud Datalab? the reason I ask is because at the bottom it says "Copyright 2015" -- also the fact that the front page mentions that the "datalab" namespace is going to be phased out, but the "google.datalab" documentation pages do not seem to include everything that is/was under the "datalab" documentation pages also concerns me. Can someone confirm that that is indeed the best source for Cloud Datalab documentation? – SheRey Apr 27 '18 at 04:06
  • @SheRey yes, these are the latest docs. Copyright year should never be changed beyond the first publication date, so this is correct. The `google.datalab` namespace is what should be used now, the deprecated `datalab` namespace is also there for reference. – yelsayed Apr 27 '18 at 18:58