Error while loading multiple parquet files from Google Cloud Storage to Big Query

Question

I am trying to push data from Google Cloud Storage to Bigquery. I am getting this error :

Provided Schema does not match Table Field tags.list.item has changed type from STRING to INTEGER

I am suspecting it because if some rows of columns are null it will take particular column as integer but when bigquery tries to load the next file using it has data as string that is where schema is conflicting.

How can we fix schema in parquet file format while pushing to bigquery?

Can you please help me overcome this issue ?

Thanks in Advance

could you share sample file and schema in order to understand your issue better? — Sakshi Gatyan, Dec 01 '22 at 07:51
@SakshiGatyan when I am converting data to dataframe it will take column _1 as int if whole column is blank and It will take column _1 as string if it has value in next file — learningtocode, Dec 01 '22 at 16:04

score 0 · Answer 1 · answered Dec 01 '22 at 09:56

0

Bigquery parquet file treats list<string> as list<int32> when empty array is passed covers one way of writing out data so all parquet files have a consistent schema.

Your hypothesis is correct BigQuery doesn't currently honor the null logical type annotation in parquet files for schema adaptation purposes, which would be necessary here since the physical types differ.

answered Dec 01 '22 at 09:56

Micah Kornfield

1,325
5
10

I am using python code to write files to local . I am not using bigquery export option as per use case. can we somehow hardcode the schema meaning all columns to string while exporting data to local using python. – learningtocode Dec 01 '22 at 15:57
please click through to the answer on the linkec question – Micah Kornfield Dec 02 '22 at 20:26

Error while loading multiple parquet files from Google Cloud Storage to Big Query

1 Answers1