We use the Google BigQuery Spark Connector to import data stored in Parquet files into BigQuery. Using custom tooling we generated a schema file needed by BigQuery and reference that in our import code (Scala).
However, our data doesn't really adhere to a fixed and well-defined schema, and in some cases additional columns may be added to individual datasets. That is why when experimenting with BigQuery using the command-line tool bq
we almost always used --ignore_unknown_values
since otherwise many imports would fail.
Unfortunately, we could not find an equivalent configuration option in the BigQuery Spark Connector com.google.cloud.bigdataoss:bigquery-connector:0.10.1-hadoop2
. Does it exist?