Spark streaming saving dataframe fails

Question

I am using Spark 2.2 to write to Redshift on AWS cluster and it is failing with the below error. I am using CDH 5.10 and scala 2.11.8. Any ideas on how to fix this? Is it missing the snappy dependency?

 WARN  TaskSetManager:66 - Lost task 0.0 in stage 3.0  org.apache.spark.SparkException: Task failed while writing rows
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:108)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I
        at org.xerial.snappy.SnappyNative.maxCompressedLength(Native Method)
        at org.xerial.snappy.Snappy.maxCompressedLength(Snappy.java:316)
        at org.apache.avro.file.SnappyCodec.compress(SnappyCodec.java:43)
        at org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:361)
        at org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:394)
        at org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:413)
        at org.apache.avro.file.DataFileWriter.flush(DataFileWriter.java:422)
        at org.apache.avro.file.DataFileWriter.close(DataFileWriter.java:445)
        at org.apache.avro.mapreduce.AvroKeyRecordWriter.close(AvroKeyRecordWriter.java:83)
        at com.databricks.spark.avro.AvroOutputWriter.close(AvroOutputWriter.scala:84)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.releaseResources(FileFormatWriter.scala:337)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:330)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)

This comes when i try to save as avro or parquet not with csv or text files. i tried adding the snappy dependency but same error — Steven Park, Oct 02 '17 at 02:23
Maybe try sbt dependencyTree to check if there're other versions of snappy being loaded. I've seen some builds with snappy excluded from other libraries and only leaving the one provided by spark. https://github.com/jrudolph/sbt-dependency-graph — maasg, Oct 02 '17 at 05:02

Spark streaming saving dataframe fails

0 Answers0