I was trying to load a BigQuery external table using Bq load command through Bq command line. Bq load command executed - bq load --source_format=NEWLINE_DELIMITED_JSON {provided dataset_name}.{provided bq external_table_name} gs://{provided bucket_name} /{provided folder_name}/{provided folder_name}/{provided folder_name}/20220107/* Error I got is: Error processing job '*:bqjob_r6bde3e8976b407bd_0000017e4295db78_1': bq_project_name:bq_dataset_name.bq_external_table_name is not allowed for this operation because it is currently a EXTERNAL. Anyone encountered this error, I've not found any parameter I need to pass to tell Bq that this is external table in Google's documentation of Bq load. Any insight on this would really help? I tried loading the external table using GoogleCloudStorageToBigQueryOperator with external_table=True but that is also producing an error saying that "'BigQuery job failed. Error was: {}'.format(err.content
Exception: BigQuery job failed. Error was: b'{\n "error": {\n "code": 409,\n "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n "errors": [\n {\n "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n "domain": "global",\n "reason": "duplicate"\n }\n ],\n "status": "ALREADY_EXISTS"\n }\n}\n
[2022-01-09 17:10:20,995] {base_task_runner.py:113} INFO - Job 230862: Subtask {subtask_name} [2022-01-09 17:10:20,993] {taskinstance.py:1147} ERROR - BigQuery job failed. Error was: b'{\n "error": {\n "code": 409,\n "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n "errors": [\n {\n "message": "Already Exists: Table project_name:dataset_name.Bq_Externaltable_name",\n "domain": "global",\n "reason": "duplicate"\n }\n ],\n "status": "ALREADY_EXISTS"\n }\n}\n'
"
this error also threw me off because I created the external table using terraform using below code block
resource google_bigquery_table external_table_name {
project = local.project
dataset_id = google_bigquery_dataset.{provided_dataset_name}.dataset_id
table_id = local.{variable defined for Bq external table}
schema = file("${path.module}/../../../schema/{folder which holds schema json}/schemajsonforexternaltable.json")
depends_on = [google_bigquery_dataset.{provided_dataset_name}]
deletion_protection = false
external_data_configuration {
autodetect = false
source_format = "NEWLINE_DELIMITED_JSON"
source_uris = [
"gs://{bucket_name}-${var.environment}/{folder_name}/{folder_name}/{folder_name}/*"
]
}
}
so why am I doing all this and whats my end goal is I want to retrieve the file name like mentioned in the query below which Google provides an option to the external table as a pseudo column (_FILE_NAME)
SELECT
p_num,
_FILE_NAME AS file_loc /* use this column to know the file name used to build the row in the Bq External table*/
FROM
`gcp_project_name.{dataset_name}.{Bq_External_Table_name}`;
If there is any any alternative other than using Bq external table to get the file name being used to build the row thats also fine I can switch to that approach as well.
@MikeKarp - I've two questions from my above post one is to load Bq external table using Bq load command which failed, from this try my question is it possible to load Bq external table using Bq load?. Second one is I was trying to load external table created through terraform (providing source uri path required for external table) using GoogleCloudStorageToBigQueryOperator with external_table=True which failed with "code": 409,\n "message": "Table Already Exists. From this second one not sure why GoogleCloudStorageToBigQueryOperator trying to create the table again when the external table is already created in my GCP project through Terraform