0

I'm trying to query large amount of data in BigQuery and then upload the table in the desired dataset (datasetxxx) using "datalab" in PyCharm as the IDE. Below is my code:

query = bq.Query(sql=myQuery)
job = query.execute_async(
        output_options=bq.QueryOutput.table('datasetxxx._tmp_table', mode='overwrite', allow_large_results=True))
job.result()

However, I ended up with "No project ID found". Project Id is imported through a .jason file as os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = path to the file. I also tried to explicitly declare project Id above as follows.

self.project_id = 'xxxxx'
query = bq.Query(sql=myQuery, context = self.project_id)

This time I ended up with the following error:

TypeError: init() got an unexpected keyword argument 'context'.

It's also an up-to-date version. Thanks for your help.

Re: The project Id is specified in the "FROM" clause and I'm also able to see the path to the .json file using "echo" command. Below is the stack-trace:

Traceback (most recent call last):
  File "xxx/Queries.py", line 265, in <module>
brwdata._extract_gbq()
  File "xxx/Queries.py", line 206, in _extract_gbq
, allow_large_results=True))

File "xxx/.local/lib/python3.5/site packages/google/datalab/bigquery/_query.py", line 260, in execute_async
table_name = _utils.parse_table_name(table_name, api.project_id)

File "xxx/.local/lib/python3.5/site-packages/google/datalab/bigquery/_api.py", line 47, in project_id
return self._context.project_id

File "xxx/.local/lib/python3.5/site-packages/google/datalab/_context.py", line 62, in project_id
raise Exception('No project ID found. Perhaps you should set one by running'

Exception: No project ID found. Perhaps you should set one by running"%datalab project set -p <project-id>" in a code cell.
user3000538
  • 189
  • 1
  • 2
  • 14

2 Answers2

1

So, if you do "echo $GOOGLE_APPLICATION_CREDENTIALS" you can see the path of your JSON. So, could you make sure if the "FROM" from the query has specified the right external project? Also, if your QueryOutput destination is your very same project, you are doing it right,

table('dataset.table'.....)

But in order case you should specify:

table('project.dataset.table'....)

I don't exactly know how are you doing the query but the error might be there.

I reproduced this and it worked fine to me:

import google.datalab
from google.datalab import bigquery as bq
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] ="./bqauth.json"

myQuery="SELECT * FROM `MY_EXAMPLE_PROJECT.MY_EXAMPLE_DATASET.MY_EXAMPLE_TABLE` LIMIT 1000"
query = bq.Query(sql=myQuery)
job = query.execute_async(
        output_options=bq.QueryOutput.table('MY_EXAMPLE_PROJECT.MY_EXAMPLE_DATASET2.MY_EXAMPLE_TABLE2', mode='overwrite', allow_large_results=True))
job.result()
Temu
  • 859
  • 4
  • 11
  • Did you check if you spelled correctly your project name ID in both, the from and the output_options? Could you show the error trace or do you know exactly in which line is it risen? – Temu Aug 20 '18 at 13:53
  • Stack-trace has been added to the question part and yes there is no typo regarding project Id. – user3000538 Aug 20 '18 at 14:07
  • If you are sure there is no typo and you tried to run a minimal piece of code and it's still not working... since it's not reproducible, I think you should open a issue in the [Google Issue Tracker](https://issuetracker.google.com/issues/new?component=187164) – Temu Aug 21 '18 at 15:35
1

Here's the updated way if someone in need:

Now you can use the Context in latest version as:

from google.datalab import bigquery as bq
from google.datalab import Context as ctx

ctx.project_id = 'PROJECT_ID'
df = bq.Query(query).execute()
...
Abdul Rehman
  • 5,326
  • 9
  • 77
  • 150