Highest Voted 'spark-avro' Questions

3

votes

1 answer

Read Avro in Azure HDI4.0

I'm trying to read an Avro file using Jupyter notebook in Azure HDInsight 4.0 with Spark 2.4. I'm not able to provide properly the .jar file to I've tried the approach suggested in How to use Avro on HDInsight Spark/Jupyter? and in…

asked Oct 25 '19 at 16:07

MDP89

306
1
9

3

votes

1 answer

Can not read avro in DataProc Spark with spark-avro

I have a cluster on Google DataProc (with image 1.4) and I want to read avro files with Spark from google cloud storage. I follow this guide: Spark read avro. The command I ran is: gcloud dataproc jobs submit pyspark test.py \ --cluster…

pyspark google-cloud-dataproc spark-avro

asked Apr 17 '19 at 06:30

user2830451

2,126
5
25
31

3

votes

2 answers

Spark Read multiple paths with automatic partitions discovery

I'm trying to read some avro files to a DataFrame from multiple path. Let's say my path is "s3a://bucket_name/path/to/file/year=18/month=11/day=01" Under this path I have two more partitions let's say country=XX/region=XX I want to read multiple…

scala apache-spark spark-avro

asked Dec 03 '18 at 08:17

R.Peretz

71
2
10

3

votes

0 answers

Spark avro predicate pushdown

We are using Avro data format and the data is partitioned by year, month, day, hour, min I see the data stored in HDFS as /data/year=2018/month=01/day=01/hour=01/min=00/events.avro And we load the data using val schema = new…

scala apache-spark apache-spark-sql predicate spark-avro

asked Aug 08 '18 at 13:46

Vijay Muvva

1,063
1
17
31

3

votes

2 answers

Spark sql saveAsTable create table append mode if new column is added in avro schema

I am using Spark sql DataSet to write data into hive. Its working perfectly if schema is same but if I change the avro schema, adding new column in between, its showing the error (Schema is provided from schema registry) Error running job streaming…

apache-spark spark-avro spark-hive

asked Feb 22 '18 at 09:19

Sumit G

436
8
21

3

votes

1 answer

Setting Values in nested field in Avro Schema

I am trying to produce avro data into kafka using GenericData.Record but I am getting the following exception: Exception in thread "main" org.apache.avro.AvroRuntimeException: Not a valid schema field: emailAddresses.email Here is my Schema: { …

java apache-kafka avro spark-avro

asked Feb 06 '18 at 13:19

Sumit G

436
8
21

3

votes

0 answers

Reading Avro messages from Kafka using Structured Streaming in Spark 2.1

I followed @Ralph Gonzalez's message on this thread reading Avro messages from Kafka using Structured Streaming in Spark 2.1, but am getting the following error. org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -40 at…

apache-spark-sql spark-structured-streaming spark-avro

asked May 11 '17 at 06:00

winterfresh

95
1
6

3

votes

2 answers

NoSuchMethodError using Databricks Spark-Avro 3.2.0

I have a spark master & worker running in Docker containers with spark 2.0.2 and hadoop 2.7. I'm trying to submit a job from pyspark from a different container (same network) by running df =…

apache-spark avro databricks spark-avro

asked Apr 03 '17 at 04:14

arinarmo

375
1
11

3

votes

1 answer

How to convert parquet file to Avro file?

I am new to hadoop and Big data Technologies. I like to convert a parquet file to avro file and read that data. I search in few forums and it suggested to use AvroParquetReader. AvroParquetReader reader = new…

hadoop apache-spark parquet spark-avro

asked Dec 23 '16 at 01:41

PrinceChamp

41
1
3

3

votes

0 answers

Complex json log data transformation using?

I am new to data science tools and have a use case to transform json logs into a flattened columnar data maybe considered as normal csv, I was looking into a lot of alternatives (tools) to approach this problem and found that I can easily solve this…

apache-spark apache-spark-sql avro spark-avro

asked Sep 05 '16 at 22:12

fireants

191
1
11

2

votes

0 answers

Pyspark + Avro type conversion problems after transformation

I use Structured Streaming to read Avro records from a Kafka topic A, do some transformations and write as Avro to another Kafka topic B. I use those functions for serializing and deserializing the Avro records. I faced another exception (parsing…

apache-spark pyspark spark-avro

asked Apr 05 '22 at 14:21

JayKay

152
11

2

votes

0 answers

"Failed to find data source: avro" exception while writing Spark Dataframe to redshift

I am following community URL https://github.com/spark-redshift-community/spark-redshift#python to connect with Redshift and it seems to use avro dependencies although i am not using avro as input source data format. My scala is 2.12 and dependencies…

maven apache-spark amazon-redshift spark-avro

asked Mar 20 '22 at 14:09

Priyanshu Sharma

93
6

2

votes

0 answers

Circular Reference in Bean Class While Creating a Dataset from an Avro Generated Class

I have a class RawSpan.java that is Avro generated from the corresponding avdl defintion. I am trying to use this class to create a Dataframe to a Dataset in Spark as: val ds = df.select("value").select(from_avro($"value", "topic",…

apache-spark apache-spark-sql avro spark-avro

asked Feb 02 '22 at 05:42

Prashant Pandey

4,332
3
26
44

2

votes

0 answers

Databricks: Provide schema in dataframe column as a parameter for from_avro

I'm trying to use the function from_avro in a dataframe. This dataframe has its origin from a streamRead from kafka and at some point I create a column with the schemaId (related to schema registry) and the message. I then have an UDF that grabs the…

apache-kafka apache-spark-sql azure-databricks spark-avro

asked Aug 23 '21 at 15:25

FEST

813
2
14
37

2

votes

1 answer

Pyspark writing dataframe to avro maintaining the sequence of key values

I am trying to read an avro file using pyspark and sort one of the columns based on certain keys. One of the columns in my avro file contains a MapType data which I need to sort based on keys. The test avro contains only one row with the entities…

python pyspark avro spark-avro

asked Jun 11 '20 at 08:04

ArinCool

1,720
1
13
24

Questions tagged [spark-avro]