0

I'm getting the following cryptic error message when trying to import an AVRO file created with fastavro into BigQuery:

Error while reading data, error message: The Apache Avro library failed to read data with the following error: Invalid branch index: 18446744073709551567, the leaves size is: 2 File:

I've searched all over the Internet, but I have no idea what this error actually means. Anyone have any ideas what could be the problem?

TBoneATL
  • 33
  • 1
  • 5
  • Could you provide more information on where you are trying to import AVRO file from and how into BigQuery? – Prajna Rai T Feb 04 '23 at 07:40
  • Hi Prajna. I'm using the google cloud console to create a new table. I specify the source as Google Cloud Storage bucket, specify the AVRO file, name the table, and create the table. BQ implies the schema from the avro file. I've also tried the cloud shell as well, but get the same error. I'm not sure what the error is actually trying to say, so any pointers to that would be appreciated. – TBoneATL Feb 04 '23 at 16:56
  • @PrajnaRaiT I appreciate your response, but i'm having an issue with the AVRO file on import. My question was more about what the error message is and what it is saying. Thank you though. – TBoneATL Feb 05 '23 at 21:58
  • Hi @TBoneATL, Can you provide more information on how you are importing avro file? – Prajna Rai T Feb 08 '23 at 07:53

1 Answers1

0

You can use the below code to load Avro data from Cloud Storage into a new BigQuery table.

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
table_id = "project.dataset.table"

job_config = bigquery.LoadJobConfig(source_format=bigquery.SourceFormat.AVRO)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.avro"

load_job = client.load_table_from_uri(
    uri, table_id, job_config=job_config
)  # Make an API request.

load_job.result()  # Waits for the job to complete.

destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

You can refer to this document for more information.

Prajna Rai T
  • 1,666
  • 3
  • 15