Highest Voted 'spark-avro' Questions

2

votes

1 answer

How to read Avro schema from empty RDD?

I'm using the AvroKeyInputFormat to read avro files: val records = sc.newAPIHadoopFile[AvroKey[T], NullWritable, AvroKeyInputFormat[T]](path) .map(_._1.datum()) Because I need to reflect over the schema in my job, I get the Avro schema like…

apache-spark avro spark-avro

asked Dec 04 '17 at 15:12

Lukas Wegmann

468
5
9

2

votes

1 answer

How to manually load spark-redshift AVRO files into Redshift?

I have a Spark job that failed at the COPY portion of the write. I have all the output already processed in S3, but am having trouble figuring out how to manually load it. COPY table FROM…

apache-spark amazon-redshift avro spark-avro

asked Jul 24 '17 at 19:22

flybonzai

3,763
11
38
72

2

votes

2 answers

Non HBase solution for storing Huge data and updating on real time

Hi i have developed an application where i have to store TB of data for the first time and then 20 GB monthly incremental like insert/update/delete in the form of xml that will be applied on top of this 5 TB of data . And finally on request basis i…

hive mapreduce hbase hadoop2 spark-avro

asked May 20 '17 at 09:41

Sudarshan kumar

1,503
4
36
83

2

votes

1 answer

Spark 1.6 load specific partition in dataframe keeping partition field

We have an avro with partitioned like this: table --a=01 --a=02 We want to load the data from a single partition keeping the partition column a. I found this stackoverflow question and I applied the suggested snippet: DataFrame df =…

java apache-spark apache-spark-sql spark-avro

asked Mar 16 '17 at 19:28

Stefano Lazzaro

387
1
4
22

2

votes

1 answer

Explode Spark Daraframe Avro Map into flat format

I am using Spark Shell v_1.6.1.5. I have the following Spark Scala Dataframe: val data = sqlContext.read.avro("/my/location/*.avro") data.printSchema root |-- id: long (nullable = true) |-- stuff: map (nullable = true) | |-- key: string | …

scala apache-spark apache-spark-sql avro spark-avro

asked Nov 15 '16 at 20:33

Marsellus Wallace

17,991
25
90
154

2

votes

1 answer

Amazon EMR and S3, org.apache.spark.sql.AnalysisException: path s3://..../var/table already exists

I'm trying to find the source of a bug on Spark 2.0.0, I have a map that holds table names as keys and the dataframe as the value, I loop through it and at the end use spark-avro (3.0.0-preview2) to write everything to S3 directories. It runs…

apache-spark amazon-s3 apache-spark-sql amazon-emr spark-avro

asked Aug 03 '16 at 16:16

Brady Auen

215
3
13

2

votes

1 answer

Spark - Avro Reads Schema but DataFrame Empty

I am using Gobblin to periodically extract relational data from Oracle, convert it to avro and publish it to HDFS My dfs directory structure looks like this -tables | -t1 | -2016080712345 | -f1.avro | -2016070714345 | …

scala apache-spark avro spark-avro gobblin

asked Jul 07 '16 at 17:05

Brian

7,098
15
56
73

2

votes

2 answers

How to read/parse only the JSON schema from a file containing an avro message in binary format?

I have an avro message in binary format in a…

avro spark-avro

asked Jun 20 '16 at 21:29

TakeSoUp

7,457
7
16
20

1

vote

0 answers

How to convert a spark dataframe schema to an avro schema using pyspark

Is there a pyspark function which could convert below _schema variable to an `avro schema? df_schema = spark.read.format('parquet').load(input_directory) _schema = df_schema.schema

pyspark spark-avro

asked Apr 05 '23 at 09:43

Fabrice Jammes

2,275
1
26
39

1

vote

0 answers

Cannot read avro file which contains whitespace in columns name using Spark-Avro 2.12:3.2.0

After migrating to Spark 3.2.0 i had to upgrade the external package of spark-avro to spark-avro 2.12:3.2.0. After this migration i was unable to read any avro file that contains spaces in their column names. The errors occurs on the read method…

apache-spark avro spark-avro

asked Dec 19 '22 at 10:38

YannickArmoogum

15
1
5

1

vote

0 answers

How to create pyspark session in Jupyter Notebook (under Dataproc Cluster) with avro datasource extension?

Via Concord, we can automatically spawn clusters with pyspark enabled dataproc clusters. In these pyspark notebooks, spark version is 2.4.8 But, by default spark does not have .avro datasource extension. Without Avro extension, we can not read .avro…

pyspark spark-avro

asked Dec 14 '22 at 06:31

Pratyay Sengupta

11
1

1

vote

0 answers

Does `from_avro` in pyspark take magic byte(4bytes) of avro byte data(from the kafka) into account?

I have streamed data in Avro format in Kafka storage and managed the schema of the data via the confluent schema registry. I'd like to pull the data using pyspark and parse the Avro byte data using schema from schema registry but it kept raising…

apache-spark pyspark apache-kafka spark-structured-streaming spark-avro

asked Nov 30 '22 at 13:52

user3595632

5,380
10
55
111

1

vote

0 answers

Can't deserialize avro file in spark

There is some problem trying to deserialize data from .avro file. My process consists of these steps: reading from Kafka df = ( spark.read.format("kafka") .option("kafka.security.protocol", "PLAINTEXT") …

apache-spark pyspark apache-kafka apache-spark-sql spark-avro

asked Nov 30 '22 at 10:41

GFR

49
2

1

vote

2 answers

Avro bytes from Event hub cannot be deserialized with pyspark

We are sending Avro data encoded with (azure.schemaregistry.encoder.avroencoder) to Event-Hub using a standalone python job and we can deserialize using the same decoder using another standalone python consumer. The schema registry is also supplied…

apache-spark azure-eventhub azure-synapse spark-avro

asked Jun 23 '22 at 09:13

Kiran Gali

101
6

1

vote

0 answers

Running spark-submit with spark-avro installed locally on a Mac or PC

I am really struggling with this one. Spent a lot of time searching for an answer in Spark manual and stack-overflow posts. Really need help. I've installed Apache Spark on my mac to build and debug PySpark code locally. However, in my PySpark code…

apache-spark pyspark spark-submit spark-avro

asked May 16 '22 at 19:08

bda

372
1
7
22

Questions tagged [spark-avro]