0

We are trying to write to write to a DSE graph (cassandra) from EMR and keep getting these errors. My JAR is a shaded jar with the byos dependencies. Any help would be appreciated.

java.lang.UnsatisfiedLinkError: org.apache.cassandra.utils.NativeLibraryLinux.getpid()J
    at org.apache.cassandra.utils.NativeLibraryLinux.getpid(Native Method)
    at org.apache.cassandra.utils.NativeLibraryLinux.callGetpid(NativeLibraryLinux.java:124)
    at org.apache.cassandra.utils.NativeLibrary.getProcessID(NativeLibrary.java:429)
    at org.apache.cassandra.utils.UUIDGen.hash(UUIDGen.java:386)
    at org.apache.cassandra.utils.UUIDGen.makeNode(UUIDGen.java:367)
    at org.apache.cassandra.utils.UUIDGen.makeClockSeqAndNode(UUIDGen.java:300)
    at org.apache.cassandra.utils.UUIDGen.<clinit>(UUIDGen.java:41)
    at com.datastax.bdp.graph.spark.sql.vertex.SimpleVertexIdAssigner$.simpleEdgeId(SimpleVertexIdAssigner.scala:19)
    at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$$anonfun$3.apply(DseGraphFrame.scala:417)
    at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$$anonfun$3.apply(DseGraphFrame.scala:416)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
    at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:131)
    at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:220)
    at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:298)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
    at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
    at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
    at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
    at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

19/04/26 12:55:49 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 18, ip-10-69-16-79.vpc.internal, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.apache.cassandra.utils.UUIDGen
    at com.datastax.bdp.graph.spark.sql.vertex.SimpleVertexIdAssigner$.simpleEdgeId(SimpleVertexIdAssigner.scala:19)
    at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$$anonfun$3.apply(DseGraphFrame.scala:417)
    at com.datastax.bdp.graph.spark.graphframe.DseGraphFrame$$anonfun$3.apply(DseGraphFrame.scala:416)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
howie
  • 2,587
  • 3
  • 27
  • 43
mat77
  • 436
  • 4
  • 7

2 Answers2

0

Usually such errors happen when temporary directory is mounted with noexec attribute that prevents loading of the native library that is used by java driver. Usual workaround to point Java to another location for temporary files with -Djava.io.tmpdir=... flag - this location shouldn't be mounted with noexec flag.

P.S. Unfortunately I don't know much about EMR

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
0

Turned out to be a JNA issue. Added the JNA dependency as a part of the shaded jar and it worked.

<dependency>
        <groupId>net.java.dev.jna</groupId>
        <artifactId>jna</artifactId>
        <version>4.2.2</version>
</dependency>
mat77
  • 436
  • 4
  • 7