Questions tagged [fastavro]

fastavro is python Avro implementations for data serialization and deserialization.

fastavro is python Avro implementations for data serialization and deserialization. More features can be found here.

38 questions
1
vote
0 answers

WriteToAvro not writing data to file after reading from BigQuery using Dataflow Template

I have been banging my head over for a month now, but I am unable tp write the data to GCS Bucket using WriteToAvro. from __future__ import absolute_import from __future__ import division from datetime import datetime, timedelta, date import…
1
vote
0 answers

unable to install apache-beam on macOS

I'm trying to install apache-beam on my python vertical environment but it didn't work! I followed the steps provided by apache beam org [Apache Beam Python SDK Quickstart], but when executing pip install apache-beam I got this error: Building…
Shahad
  • 81
  • 4
1
vote
2 answers

Trouble installing packages googleclient and fastavro

I'm trying to write the file names of my GDrive into an Avro-file. To connect to the GDrive I used these instructions. https://developers.google.com/drive/api/v3/quickstart/python and for the output I use the writer from fastavro While using the…
1
vote
1 answer

fastavro - Convert json file into avro file

A bit new to avro & python. I am trying to do a simple conversion to avro using the fastavro library, as the speed of the native apache avro library is just a bit too slow. I want to: 1.Take a json file 2. Convert the data to avro. My problem is…
joec273
  • 13
  • 1
  • 5
0
votes
0 answers

How to execute fastavro shell command on in only python code?

I have a working AVRO converter that will output JSON formatted schema & records files and a CSV records file. To generate the JSON schema & recodes files I'm calling the following fastavro shell commands. Schema: subprocess.run("fastavro --schema…
Tarlak333
  • 1
  • 1
0
votes
0 answers

fastavro.schemaless_reader performance loss when profiling is enabled

I am attempting to profile a Python app that uses the fastavro library. I am profiling using the Datadog Profiler I run the application using the command ddtrace-run python -m app.main I enable the profiler using the environment variable…
0
votes
1 answer

How can I create an Avro schema from a python class?

How can I transform my simple python class like the following into a avro schema? class Testo(SQLModel): name: str mea: int This is the Testo.schema() output { "title": "Testo", "type": "object", "properties": { "name":…
feder
  • 1,849
  • 2
  • 25
  • 43
0
votes
1 answer

How can I auto-generate a pulsar AvroSchema class from an existing model?

I'm running Apache Pulsar schemaless where the content is often changing. Now, there is some specific data for which I've written "data" classes (derived of SQLModel, which doesn't really matter in this case). Since these models (data classes) are…
feder
  • 1,849
  • 2
  • 25
  • 43
0
votes
1 answer

issue on avro file import in Google BigQuery

I'm getting the following cryptic error message when trying to import an AVRO file created with fastavro into BigQuery: Error while reading data, error message: The Apache Avro library failed to read data with the following error: Invalid branch…
TBoneATL
  • 33
  • 1
  • 5
0
votes
1 answer

Changing schema of avro file when writing to it in append mode

I'm looking for a way to modify the schema of an avro file in python. Taking the following example, using the fastavro package, first write out some initial records, with corresponding schema: from fastavro import writer, parse_schema schema = { …
0
votes
0 answers

Partial match file headers to Avro schema headers

I am building a data pipeline and making use of Avro as the file format. The pipeline is built in Python. The source data is received as csv and converted to avro format using a defined avro schema as expected. During the conversion, there is…
0
votes
1 answer

schema mismatch converting data between 2 schemas using aliases in fastavro

I'm trying to convert some data that matches schema old_schema to the field names used in new_schema using aliases. I've been at it for too long and can't see what is wrong with this code: from fastavro import writer, reader, json_writer from…
jamzsabb
  • 1,125
  • 2
  • 18
  • 40
0
votes
1 answer

Remove Avro type keys from JSON message format

I'm trying to create a script to deserialize some Avro messages that comes from Kafka. The messages have a format like: { "value": { "value1": { "string": "AAAA" } } } and I need it to be something like that { "value": { …
Mister
  • 1
  • 1
0
votes
2 answers

In Apache Beam/Dataflow's WriteToBigQuery transform, how do you enable the deadletter pattern with Method.FILE_LOADS and Avro temp_file_format

In this document, Apache Beam suggests the deadletter pattern when writing to BigQuery. This pattern allows you to fetch rows that failed to be written from the transform output with the 'FailedRows' tag. However, when I try to use…
0
votes
2 answers

AvroSerializer: schema for orderbook snapshots

I have a Kafka cluster running and I want to store L2-orderbook snapshots into a topic that have a dictionary of {key:value} pairs where the keys are of type float as the following example: { 'exchange': 'ex1', 'symbol': 'sym1', 'book':…
CarloP
  • 99
  • 1
  • 12