0

I was trying to load a BigQuery external table using Bq load command through Bq command line. Bq load command executed - bq load --source_format=NEWLINE_DELIMITED_JSON {provided dataset_name}.{provided bq external_table_name} gs://{provided bucket_name} /{provided folder_name}/{provided folder_name}/{provided folder_name}/20220107/* Error I got is: Error processing job '*:bqjob_r6bde3e8976b407bd_0000017e4295db78_1': bq_project_name:bq_dataset_name.bq_external_table_name is not allowed for this operation because it is currently a EXTERNAL. Anyone encountered this error, I've not found any parameter I need to pass to tell Bq that this is external table in Google's documentation of Bq load. Any insight on this would really help? I tried loading the external table using GoogleCloudStorageToBigQueryOperator with external_table=True but that is also producing an error saying that "'BigQuery job failed. Error was: {}'.format(err.content

Exception: BigQuery job failed. Error was: b'{\n  "error": {\n    "code": 409,\n    "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n    "errors": [\n      {\n        "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n        "domain": "global",\n        "reason": "duplicate"\n      }\n    ],\n    "status": "ALREADY_EXISTS"\n  }\n}\n
[2022-01-09 17:10:20,995] {base_task_runner.py:113} INFO - Job 230862: Subtask {subtask_name} [2022-01-09 17:10:20,993] {taskinstance.py:1147} ERROR - BigQuery job failed. Error was: b'{\n  "error": {\n    "code": 409,\n    "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n    "errors": [\n      {\n        "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n        "domain": "global",\n        "reason": "duplicate"\n      }\n    ],\n    "status": "ALREADY_EXISTS"\n  }\n}\n'
"
this error also threw me off because I created the external table using terraform using below code block
resource google_bigquery_table external_table_name {
project = local.project
dataset_id = google_bigquery_dataset.{provided_dataset_name}.dataset_id
table_id = local.{variable defined for Bq external table}
schema = file("${path.module}/../../../schema/{folder which holds schema json}/schemajsonforexternaltable.json")
depends_on = [google_bigquery_dataset.{provided_dataset_name}]
deletion_protection = false
external_data_configuration {
autodetect = false
source_format = "NEWLINE_DELIMITED_JSON"
source_uris = [
"gs://{bucket_name}-${var.environment}/{folder_name}/{folder_name}/{folder_name}/*"
]
}
}
so why am I doing all this and whats my end goal is I want to retrieve the file name like mentioned in the query below which Google provides an option to the external table as a pseudo column (_FILE_NAME)
SELECT
  p_num,
  _FILE_NAME AS file_loc /* use this column to know the file name used to build the row in the Bq External table*/
FROM
  `gcp_project_name.{dataset_name}.{Bq_External_Table_name}`;
If there is any any alternative other than using Bq external table to get the file name being used to build the row thats also fine I can switch to that approach as well.

@MikeKarp - I've two questions from my above post one is to load Bq external table using Bq load command which failed, from this try my question is it possible to load Bq external table using Bq load?. Second one is I was trying to load external table created through terraform (providing source uri path required for external table) using GoogleCloudStorageToBigQueryOperator with external_table=True which failed with "code": 409,\n "message": "Table Already Exists. From this second one not sure why GoogleCloudStorageToBigQueryOperator trying to create the table again when the external table is already created in my GCP project through Terraform

PyAI
  • 1
  • 3
  • If I'm following correctly, its looks like you've got some concepts mixed. An external table isn't loaded, you just define it, and then when you query in bigquery, it goes and uses the file. If you want to data directly into bigquery, and store it as a static table, then you can use load but then don't try to define it as external. Does that make sense? – Mike Karp Jan 11 '22 at 00:22
  • I added my two questions to the original post because there is a limitation with number of characters for the comments section. Please refer last block in the original post after @Mike Karp – PyAI Jan 11 '22 at 12:46
  • Got it -- answered as an answer below. Let me know if I'm not following correctly. – Mike Karp Jan 11 '22 at 20:03

1 Answers1

1

Ahhhh I believe I follow now, so you have an existing external table that you can query, and you want to load that into a new static table, is that right? Easiest way to do that is with SQL directly.

You can create a new table from a separate existing external table via a CREATE statement in SQL:

CREATE TABLE `gcp_project_name.{dataset_name}.{new_standard_table_name}` as
SELECT *,
      _FILE_NAME AS file_loc 
FROM
 `gcp_project_name.{dataset_name}.{Bq_External_Table_name}`;

You can run this directly in the bigquery SQL editior after updating the table names above.

As per your question above, I also left the the additional column in as you noted, which is fine -- you can add as many new derivations etc as you need in the SELECT component and those will make it into the new table.

Mike Karp
  • 1,477
  • 13
  • 19