0

I am using the bq version 2.0.58 and trying to create hive partitioning BigQuery external table. But it's missing hive partition key column while creating hive partition external table using bq command, the following command to create a table.

Storage:

gs://<bucket>/myfolder/size=1000/file_1k.csv
gs://<bucket>/myfolder/size=10000/file_10k.csv

The mkdef command,

bq mkdef --project_id=my-project --autodetect --source_format=CSV --hive_partitioning_mode=AUTO --hive_partitioning_source_uri_prefix=gs://<bucket>/myfolder gs://<bucket>/myfolder/* >my_table_def.json

The mk command, bq mk --external_table_definition=my_table_def.json my-project:my-dataset.mytable

The above command created a table with a hive partition key column. But When I add a schema into it. It's missing to add the partition key (size) column while creating the table.

The following bq command missing to create partition key column while creating the table.

bq mk --schema=ip:STRING,fraudype:STRING,probability:FLOAT --external_table_definition=my_table_def.json my-project:my-dataset.mytable

my_table_def.json,

{
   "csvOptions": {
     "encoding": "UTF-8",
     "quote": "\""
   },
   "hivePartitioningOptions": {
     "mode": "AUTO",
     "sourceUriPrefix": "gs://<bucket>/myfolder"
   },
   "sourceFormat": "CSV",
   "sourceUris": [
     "gs://<bucket>/myfolder/*"
   ]
 }
SST
  • 2,054
  • 5
  • 35
  • 65
  • Have you followed the [official documentation](https://cloud.google.com/bigquery/docs/hive-partitioned-loads-gcs)? Could you please share error message? – aga Jul 16 '20 at 12:50
  • yes, there is no error, the column size is not present in the created table to execute the following query, "SELECT * FROM `my-project:my-dataset.mytable` where size=1000 LIMIT 100" – SST Jul 16 '20 at 12:58
  • I would like to ask you for add "--require_hive_partition_filter=True" option to the bq mkdef command and let me know about the results. – aga Jul 21 '20 at 08:49

1 Answers1

2

The --schema command should not be used with bq mk for hive partitioned external table definitions, mkdef will help.

Solution:

bq mkdef --source_format=CSV \
--hive_partitioning_mode=CUSTOM \
--hive_partitioning_source_uri_prefix=gs://<bucket>/myfolder/{size:INTEGER} \
gs://<bucket>/myfolder/* \
ip:STRING,fraudype:STRING,probability:FLOAT >my_table_def.json

bq mk --external_table_definition=my_table_def.json my-project:my-dataset.mytable
SANN3
  • 9,459
  • 6
  • 61
  • 97