0

I executed simple sample (spark, Windows7) and get unexpected error message FileAlreadyExistsException. I cannot find the folder or file on my computer.

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/PluralsightData/ReadMeWordCountViaApp already exists at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1191) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1168)

package main

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._

object WordCounter {
    def main(args: Array[String]) {
        val conf = new SparkConf().setAppName("Word Counter")
        val sc = new SparkContext(conf)
        //val textFile = sc.textFile("file:///Spark/README.md")
        val textFile = sc.textFile("file:///README.md")
        val tokenizedFileData = textFile.flatMap(line=>line.split(" "))
        val countPrep = tokenizedFileData.map(word=>(word, 1))
        val counts = countPrep.reduceByKey((accumValue, newValue)=>accumValue + newValue)
        val sortedCounts = counts.sortBy(kvPair=>kvPair._2, false)
        sortedCounts.saveAsTextFile("file:///PluralsightData/ReadMeWordCountViaApp")
    }
}

Sources of the sample can be found https://github.com/constructor-igor/TechSugar/tree/master/ScalaSamples/WordCounterSample

sarveshseri
  • 13,738
  • 28
  • 47
constructor
  • 1,412
  • 1
  • 17
  • 34
  • 1
    Well... it is as clear as it says that `output directory already exists` and thus your output `saveAsTextFile` will not work. Most big-data frameworks prefer to avoid the chances of over-writing any existing data. So... they do not allow output in existing directories. Just pick some other directory for your output. – sarveshseri Feb 06 '17 at 13:50
  • How can I found directory where `saveAsTextFile` store result and open it? – constructor Feb 06 '17 at 16:13
  • 1
    What about using an **absolute** path like `"file:///C:/temp/WordCount`? Or look at http://stackoverflow.com/questions/38669206/spark-2-0-relative-path-in-absolute-uri-spark-warehouse about some possible glitches across Spark versions. – Samson Scharfrichter Feb 06 '17 at 22:28
  • yes, it solved my issue. thank you. – constructor Feb 07 '17 at 09:51

1 Answers1

1

According to comments:

  1. Spark prefer to avoid over-writing any existing data.

  2. Absolute path of target file allows to find result's data on local disk.

    sortedCounts.saveAsTextFile("file:///C:/temp/ReadMeWordCountViaApp")

constructor
  • 1,412
  • 1
  • 17
  • 34