0

i am trying to submit a very simple application,it consists to create two rdds from one input large file (about 500 GO), subtract the header (first line), zip them with indexes, map them to key-value with a small modification then save them as a text file

i was able to see on spark web UI the progress of jobs, the last 2 jobs was failed due to this error please tell me what is its reason and how to solve it

Job aborted due to stage failure: Task 4897 in stage 2.0 failed 1 times, most recent failure: Lost task 4897.0 in stage 2.0 (TID 4914, localhost): java.io.IOException: Aucun espace disponible sur le périphérique at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at net.jpountz.lz4.LZ4BlockOutputStream.finish(LZ4BlockOutputStream.java:243) at net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStream.java:175) at org.apache.spark.serializer.DummySerializerInstance$1.close(DummySerializerInstance.java:65) at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1296) at org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlockObjectWriter.scala:107) at org.apache.spark.storage.DiskBlockObjectWriter.commitAndClose(DiskBlockObjectWriter.scala:132) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:188) at org.apache.spark.shuffle.sort.ShuffleExternalSorter.closeAndGetSpills(ShuffleExternalSorter.java:410) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:204) at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:164) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:

PS; there is a french sentence :

Aucun espace disponible sur le périphérique :it means there is no available space on this device,

well i think it is the reason but i didn't understand which device and how to solve it

asma
  • 25
  • 8
  • can you check spark scratch space disk size / spark worker directory size. I guess thats is what running out of space – Rohith Yeravothula Sep 01 '16 at 14:07
  • how can i check it? – asma Sep 01 '16 at 14:18
  • @Rohith Yeravothula – asma Sep 01 '16 at 14:27
  • are you using any cluster manager like Mesos or Yarn or using simple spark standalone ?? – Rohith Yeravothula Sep 02 '16 at 07:12
  • In case of standalone master -> It needs to be set in spark-defaults. if not set spark will assume local directory (where spark folder is present ) as worker directory. In such case please check that particular disk space. – Rohith Yeravothula Sep 02 '16 at 07:19
  • yes i am using standalone, i am sorry for asking this but iam new to spark and linux, so how can i check the size of this directory, and ho to configure it in spark-defaults? – asma Sep 02 '16 at 08:52
  • and do u mean that the worker use this space for processing or for saving outputs? please forgive me because i'mstill learning – asma Sep 02 '16 at 08:57
  • firstly you don't have to apologize, use command "df -h" to see all disks and look out for the disk containing the spark folder. check if that disk is running low on space while running the application. – Rohith Yeravothula Sep 02 '16 at 10:41

0 Answers0