Highest Voted 'spark-avro' Questions

0

votes

1 answer

How to map one column with other columns in an avro file?

I'm using Spark 2.1.1 and Scala 2.11.8 This question is an extension of one my earlier questions: How to identify null fields in a csv file? The change is that rather than reading the data from a CSV file, I'm now reading the data from an avro file.…

scala apache-spark spark-avro

asked Jul 05 '17 at 10:28

PixieDev

307
7
19

0

votes

2 answers

In Spark, How to convert multiple dataframes into an avro?

I have a Spark job that processes some data into several individual dataframes. I store these dataframes in a list, i.e. dataframes[]. Eventually, I'd like to combine these dataframes into a hierarchical format and write the output in avro. The avro…

apache-spark pyspark avro emr spark-avro

asked Jun 01 '17 at 16:59

James

43
5

0

votes

1 answer

How to write an avro file from csv file with Spark?

I am faced with a NullPointerException when i try to write avro file from a DF created from csv files : public static void main(String[] args) { SparkSession spark = SparkSession .builder() .appName("SparkCsvToAvro") …

java csv apache-spark avro spark-avro

asked May 09 '17 at 22:58

Quentin Geff

819
1
6
21

0

votes

1 answer

How to add databricks avro jar to hdinsight

I'm currently trying to run a Spark Scala job on our HDInsight cluster with the external library spark-avro, without success. Could someone help me out with this? The goal is to find the necesseray steps to be able to read avro files residing on…

scala apache-spark intellij-idea azure-hdinsight spark-avro

asked Apr 04 '17 at 15:54

JMordijck

73
7

0

votes

1 answer

Avro Kafka conversion issues between scala and Python

Our project has both scala and python code and we need to send/consume avro encoded messages to kafka. I am sending avro encodes messages to kafka using python and scala. I have producer in scala code which send avro encoded messages using Twitter…

python scala apache-kafka spark-avro

asked Feb 11 '17 at 03:20

Abhishek

111
2
7

0

votes

1 answer

Data Conversion for a field using AVRO

I am new to AVRO. We have started using AVRO schema to read data. Now we have a use case where I need to truncate the data while reading. Suppose my avro schcema is like this { "name": "table", "namepsace": "csd", "type": "record", …

avro spark-avro avro-tools avro4s

asked Feb 09 '17 at 05:59

user7538424

1
2

0

votes

0 answers

How to convert GenericRecord to a json string corresponding to the schema given in Avro

I have a requirement where i need to store the data in json format in AWS S3, we are currently hitting an enpoint which gives List[GenericRecord], and that needs to be stored in Json format, can any one share a sample code for achieving this. I am…

java avro spark-avro jackson-dataformat-avro

asked Feb 06 '17 at 13:17

Pradeep.D.s

1
6

0

votes

2 answers

Converting data into Parquet in Spark

I have some legacy data in S3 which I want to convert to parquet format using Spark 2 using the Java API. I have the desired Avro schema (.avsc files) and their generated Java classes using the Avro compiler and I want to store the data using those…

apache-spark avro parquet spark-avro

asked Jan 18 '17 at 08:33

Swaranga Sarma

13,055
19
60
93

0

votes

1 answer

Spark AVRO S3 read not working for partitioned data

When I read a specific file it works: val filePath= "s3n://bucket_name/f1/f2/avro/dt=2016-10-19/hr=19/000000" val df = spark.read.avro(filePath) But if I point to a folder to read date partitioned data it fails: val…

apache-spark amazon-s3 avro spark-avro

asked Nov 17 '16 at 19:09

JNish

145
2
10

0

votes

1 answer

Avro Schema Generation in HDFS

I have a scenario where I have some set of avro files in HDFS.And I need generate Avro Schema files for those AVRO data files in HDFS.I tried researching using Spark…

hadoop apache-spark avro spark-avro

asked Oct 14 '16 at 15:38

Govind

419
8
25

0

votes

1 answer

How to write an avro file with Spark?

I've an Array[Byte] that represents an avro schema. I'm trying to write it to Hdfs as avro file with spark. This is the code: val values = messages.map(row => (null,AvroUtils.decode(row._2,topic))) .saveAsHadoopFile( outputPath, …

hadoop apache-spark avro spark-avro

asked Sep 12 '16 at 16:12

Beniamino Del Pizzo

873
1
7
19

0

votes

1 answer

Trying to understand the Spark UI jobs tab

I am working on a spark program in which I have to load avro data and process it. I am trying to understand how the job ids are created for a spark application. I use the below line of code to load the avro…

apache-spark spark-avro

asked Jul 18 '16 at 16:38

srujana

182
2
10

0

votes

1 answer

Convert Xml to Avro from Kafka to hdfs via spark streaming or flume

I want to convert xml files to avro. The data will be in xml format and will be hit the kafka topic first. Then, I can either use flume or spark-streaming to ingest and convert from xml to avro and land the files in hdfs. I have a cloudera…

apache-spark apache-kafka spark-streaming avro spark-avro

asked May 31 '16 at 16:30

Defcon

807
3
15
36

0

votes

0 answers

find out what file is responsible for exception

I'm opening a bunch of files (around 50) at HDFS like this: val PATH = path_to_files val FILE_PATH = "PATH+nt_uuid_2016-03-01.*1*.avro" val df = sqlContext.read.avro(FILE_PATH) I then do a bunch of operations with df and at some point I…

scala hadoop apache-spark avro spark-avro

asked Apr 11 '16 at 14:36

elelias

4,552
5
30
45

-1

votes

1 answer

'java.lang.OutOfMemoryError: Java heap space' error in spark application while trying to read the avro file and performing Actions

The avro size is around 44MB. Below is the yarn logs error : 20/03/30 06:55:04 INFO spark.ExecutorAllocationManager: Existing executor 18 has been removed (new total is 0) 20/03/30 06:55:04 INFO cluster.YarnClusterScheduler: Cancelling stage…

scala apache-spark spark-avro

asked Mar 30 '20 at 11:55

Vikas

107
1
10

Questions tagged [spark-avro]