Recently I face an issue while writing the dataframe data into BigQuery using pyspark. Here it was:
pyspark.sql.utils.IllegalArgumentException: u'Temporary or persistent GCS bucket must be informed
After research the issue I found that Temporary GCS bucket to be mentioned spark.conf
.
bucket = "temp_bucket"
spark.conf.set('temporaryGcsBucket', bucket)
I think there is no concept to have a file for a table in Biquery like Hive.
I would like to know more about it, why we need to have temp-gcs-bucket to write the data into bigquery?
I was searching for the reason behind this but I couldn't.
Please clarify.