Spark error "Output directory file already exists

Question

I executed simple sample (spark, Windows7) and get unexpected error message FileAlreadyExistsException. I cannot find the folder or file on my computer.

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/PluralsightData/ReadMeWordCountViaApp already exists at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1191) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168)

package main

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._

object WordCounter {
    def main(args: Array[String]) {
        val conf = new SparkConf().setAppName("Word Counter")
        val sc = new SparkContext(conf)
        //val textFile = sc.textFile("file:///Spark/README.md")
        val textFile = sc.textFile("file:///README.md")
        val tokenizedFileData = textFile.flatMap(line=>line.split(" "))
        val countPrep = tokenizedFileData.map(word=>(word, 1))
        val counts = countPrep.reduceByKey((accumValue, newValue)=>accumValue + newValue)
        val sortedCounts = counts.sortBy(kvPair=>kvPair._2, false)
        sortedCounts.saveAsTextFile("file:///PluralsightData/ReadMeWordCountViaApp")
    }
}

Sources of the sample can be found https://github.com/constructor-igor/TechSugar/tree/master/ScalaSamples/WordCounterSample

Well... it is as clear as it says that `output directory already exists` and thus your output `saveAsTextFile` will not work. Most big-data frameworks prefer to avoid the chances of over-writing any existing data. So... they do not allow output in existing directories. Just pick some other directory for your output. — sarveshseri, Feb 06 '17 at 13:50
How can I found directory where `saveAsTextFile` store result and open it? — constructor, Feb 06 '17 at 16:13
What about using an **absolute** path like `"file:///C:/temp/WordCount`? Or look at http://stackoverflow.com/questions/38669206/spark-2-0-relative-path-in-absolute-uri-spark-warehouse about some possible glitches across Spark versions. — Samson Scharfrichter, Feb 06 '17 at 22:28

score 1 · Accepted Answer · answered Feb 07 '17 at 11:06

1

According to comments:

Spark prefer to avoid over-writing any existing data.
Absolute path of target file allows to find result's data on local disk.

sortedCounts.saveAsTextFile("file:///C:/temp/ReadMeWordCountViaApp")

answered Feb 07 '17 at 11:06

constructor

1,412
1
17
34

Spark error "Output directory file already exists

1 Answers1