Highest Voted 'spark-avro' Questions

0

votes

1 answer

Unable to access de-serialized nested avro generic record elements in scala

I am using Structured Streaming (Spark 2.4.0) to read avro mesages through kafka and using Confluent schema-Registry to receive/read schema I am unable to access the deeply nested fields. Schema looks like this in compacted avsc…

asked Mar 26 '20 at 20:25

unchained

19
6

0

votes

1 answer

Issue building Apache Spark with Avro

I am trying to build spark from master branch with ./build/sbt clean package I want to test something specific to spark-avro submodule. However when I run the ./bin/spark-shell and try: scala> import org.apache.spark.sql.avro._ I receive object avro…

scala apache-spark apache-spark-sql avro spark-avro

asked Feb 22 '20 at 05:26

irrelevantUser

1,172
18
35

0

votes

4 answers

Spark Avro throws exception on file write: NoSuchMethodError

Any file write attempt of Avro format fails with the stack trace below. We are using Spark 2.4.3 (with user provided Hadoop), Scala 2.12, and we load the Avro package at runtime with either spark-shell: spark-shell --packages…

scala apache-spark avro spark-avro

asked Feb 19 '20 at 01:31

Thomas Humphries

47
3
9

0

votes

0 answers

how to use a file date automatically in scala?

I am reading an avro file from Azure data lake using databricks and I am using this path to read current date file for daily run, the code to drive the file date looks like this and it gets the current date fine. val pfdtm =…

date azure-data-lake azure-databricks spark-avro

asked Feb 05 '20 at 03:53

HaiY

145
1
5
15

0

votes

1 answer

Getting exception reading from avro table using Spark or Hive console - Failed to obtain maxLength value for varchar field from file schema: "string"

I have created 2 tables in Hive CREATE external TABLE avro1(id INT,name VARCHAR(64),dept VARCHAR(64)) PARTITIONED BY (yoj VARCHAR(64)) STORED AS avro; CREATE external TABLE avro2(id INT,name VARCHAR(64),dept VARCHAR(64)) PARTITIONED BY (yoj…

apache-spark hive avro spark-avro

asked Jan 09 '20 at 06:54

shashank sharma

1

0

votes

1 answer

How to read all columns from Avro when newer partitions have more columns then older ones?

I've got data in Avro format partitioned by date and time and I receiving new data every hour. Newer partitions can contain more columns then older ones. When I read it by Spark 2.4.3 I got DataFrame with schema of the first(oldest) partition and…

apache-spark avro spark-avro

asked Nov 18 '19 at 08:29

AndreyFaktor

1
1

0

votes

0 answers

spark sql error when reading data from Avro Table

When I try reading data from an avro table using spark-sql, I am getting this error. Caused by: java.lang.NullPointerException at…

apache-spark avro spark-avro

asked Nov 11 '19 at 14:13

Srinivas

2,010
7
26
51

0

votes

0 answers

Why am I running out of memory when adding bulk documents to elastic search using bulk helpers?

I'm converting .avro files to JSON format, then parsing specific data items to be indexed on my elastic search cluster. Each chunk contains roughly 1.8 gigabytes of data and there are about 500 chunks. It doesn't take long to run out of memory, but…

python elasticsearch spark-avro

asked Oct 18 '19 at 19:10

Remixt

597
6
28

0

votes

1 answer

Convert Array[Byte] to JSON format using Spark Scala

I'm reading an .avro file where the data of a particular column is in binary format. I'm currently converting the binary format to string format with the help of UDF for a readable purpose and then finally i will need to convert it into JSON format…

json scala apache-spark apache-spark-sql spark-avro

asked Sep 06 '19 at 05:36

Anil Kumar

525
6
27

0

votes

0 answers

Spark is unable to read the avro file format

When i query in Apache drill for an .avro file, i'm getting the Body column values correctly as shown below snapshot. But if i do the same in Spark-SQL, Body column values are coming in a binary format. Is there a way i can read the data correctly…

apache-spark apache-spark-sql spark-avro

asked Aug 30 '19 at 10:12

Anil Kumar

525
6
27

0

votes

1 answer

How to assign constant values to the nested objects in pyspark?

I have a requirement where I need to mask the data for some of the fields in a given schema. I've researched a lot and couldn't find the answer that is needed. This is the schema where I need some changes on the fields(answer_type,response0,…

pyspark apache-spark-sql spark-avro

asked Aug 18 '19 at 18:29

Venkatesh Vaggu

3
2

0

votes

0 answers

Read data from an Avro file and write to an Impala table

The project I am working receive data in the form a Avro files. I am writing a generic code in Scala (using Scala IDE) to read all Avro files present in a folder and create a table for each of the avro file. I am reading the avro file data as a…

scala apache-spark apache-spark-sql impala spark-avro

asked Aug 14 '19 at 16:39

MuraliKoti

1
1

0

votes

1 answer

Trouble reading avro files in Jupyter notebook using pyspark

I am trying to read an avro file in Jupyter notebook using pyspark. When I read the file i am getting an error. I have downloaded spark-avro_2.11:4.0.0.jar, i am not sure where in my code I should be inserting the avro package. Any suggestions…

pyspark jupyter-notebook spark-avro

asked Jun 16 '19 at 12:32

Conz

3
2

0

votes

0 answers

Get object size from File

I have an avro file outputted from a spark job with some objects in it: Objavro.schema�{"type":"record","name":"topLevelRecord","fields": [{"name":"Name","type":["String","null"]},{"name":"Age","type": ["int","null"]}]} Is there a way to get…

apache-spark spark-avro

asked Feb 11 '19 at 20:51

ThatComputerGuy

323
3
6
11

0

votes

2 answers

Does size of part files play a role for Spark SQL performance

I am trying to query the hdfs which has lot of part files (avro). Recently we made a change to reduce parallelism and thus the size of part files have increased , the size of each of these part files are in the range of 750MB to 2 GB (we use spark…

apache-spark apache-spark-sql query-performance spark-avro

asked Nov 29 '18 at 19:59

user3679686

516
1
6
20

Questions tagged [spark-avro]