Questions tagged [spark-avro]

A library for reading and writing Avro data from Spark SQL.

The GitHub page is here.

227 questions
0
votes
1 answer

How to map one column with other columns in an avro file?

I'm using Spark 2.1.1 and Scala 2.11.8 This question is an extension of one my earlier questions: How to identify null fields in a csv file? The change is that rather than reading the data from a CSV file, I'm now reading the data from an avro file.…
PixieDev
  • 307
  • 7
  • 19
0
votes
2 answers

In Spark, How to convert multiple dataframes into an avro?

I have a Spark job that processes some data into several individual dataframes. I store these dataframes in a list, i.e. dataframes[]. Eventually, I'd like to combine these dataframes into a hierarchical format and write the output in avro. The avro…
James
  • 43
  • 5
0
votes
1 answer

How to write an avro file from csv file with Spark?

I am faced with a NullPointerException when i try to write avro file from a DF created from csv files : public static void main(String[] args) { SparkSession spark = SparkSession .builder() .appName("SparkCsvToAvro") …
Quentin Geff
  • 819
  • 1
  • 6
  • 21
0
votes
1 answer

How to add databricks avro jar to hdinsight

I'm currently trying to run a Spark Scala job on our HDInsight cluster with the external library spark-avro, without success. Could someone help me out with this? The goal is to find the necesseray steps to be able to read avro files residing on…
0
votes
1 answer

Avro Kafka conversion issues between scala and Python

Our project has both scala and python code and we need to send/consume avro encoded messages to kafka. I am sending avro encodes messages to kafka using python and scala. I have producer in scala code which send avro encoded messages using Twitter…
Abhishek
  • 111
  • 2
  • 7
0
votes
1 answer

Data Conversion for a field using AVRO

I am new to AVRO. We have started using AVRO schema to read data. Now we have a use case where I need to truncate the data while reading. Suppose my avro schcema is like this { "name": "table", "namepsace": "csd", "type": "record", …
0
votes
0 answers

How to convert GenericRecord to a json string corresponding to the schema given in Avro

I have a requirement where i need to store the data in json format in AWS S3, we are currently hitting an enpoint which gives List[GenericRecord], and that needs to be stored in Json format, can any one share a sample code for achieving this. I am…
0
votes
2 answers

Converting data into Parquet in Spark

I have some legacy data in S3 which I want to convert to parquet format using Spark 2 using the Java API. I have the desired Avro schema (.avsc files) and their generated Java classes using the Avro compiler and I want to store the data using those…
Swaranga Sarma
  • 13,055
  • 19
  • 60
  • 93
0
votes
1 answer

Spark AVRO S3 read not working for partitioned data

When I read a specific file it works: val filePath= "s3n://bucket_name/f1/f2/avro/dt=2016-10-19/hr=19/000000" val df = spark.read.avro(filePath) But if I point to a folder to read date partitioned data it fails: val…
JNish
  • 145
  • 2
  • 10
0
votes
1 answer

Avro Schema Generation in HDFS

I have a scenario where I have some set of avro files in HDFS.And I need generate Avro Schema files for those AVRO data files in HDFS.I tried researching using Spark…
Govind
  • 419
  • 8
  • 25
0
votes
1 answer

How to write an avro file with Spark?

I've an Array[Byte] that represents an avro schema. I'm trying to write it to Hdfs as avro file with spark. This is the code: val values = messages.map(row => (null,AvroUtils.decode(row._2,topic))) .saveAsHadoopFile( outputPath, …
Beniamino Del Pizzo
  • 873
  • 1
  • 7
  • 19
0
votes
1 answer

Trying to understand the Spark UI jobs tab

I am working on a spark program in which I have to load avro data and process it. I am trying to understand how the job ids are created for a spark application. I use the below line of code to load the avro…
srujana
  • 182
  • 2
  • 10
0
votes
1 answer

Convert Xml to Avro from Kafka to hdfs via spark streaming or flume

I want to convert xml files to avro. The data will be in xml format and will be hit the kafka topic first. Then, I can either use flume or spark-streaming to ingest and convert from xml to avro and land the files in hdfs. I have a cloudera…
Defcon
  • 807
  • 3
  • 15
  • 36
0
votes
0 answers

find out what file is responsible for exception

I'm opening a bunch of files (around 50) at HDFS like this: val PATH = path_to_files val FILE_PATH = "PATH+nt_uuid_2016-03-01.*1*.avro" val df = sqlContext.read.avro(FILE_PATH) I then do a bunch of operations with df and at some point I…
elelias
  • 4,552
  • 5
  • 30
  • 45
-1
votes
1 answer

'java.lang.OutOfMemoryError: Java heap space' error in spark application while trying to read the avro file and performing Actions

The avro size is around 44MB. Below is the yarn logs error : 20/03/30 06:55:04 INFO spark.ExecutorAllocationManager: Existing executor 18 has been removed (new total is 0) 20/03/30 06:55:04 INFO cluster.YarnClusterScheduler: Cancelling stage…
Vikas
  • 107
  • 1
  • 10
1 2 3
15
16