Writing to a file in Apache Spark

Question

I am writing a Scala code that requires me to write to a file in HDFS. When I use Filewriter.write on local, it works. The same thing does not work on HDFS. Upon checking, I found that there are the following options to write in Apache Spark- RDD.saveAsTextFile and DataFrame.write.format.

My question is: what if I just want to write an int or string to a file in Apache Spark?

Follow up: I need to write to an output file a header, DataFrame contents and then append some string. Does sc.parallelize(Seq(<String>)) help?

Ronak Patel · Accepted Answer · 2016-08-26T19:29:02.810

20

create RDD with your data (int/string) using Seq: see parallelized-collections for details:

sc.parallelize(Seq(5))  //for writing int (5)
sc.parallelize(Seq("Test String")) // for writing string

val conf = new SparkConf().setAppName("Writing Int to File").setMaster("local")
val sc = new SparkContext(conf) 
val intRdd= sc.parallelize(Seq(5))   
intRdd.saveAsTextFile("out\\int\\test")

val conf = new SparkConf().setAppName("Writing string to File").setMaster("local")
val sc = new SparkContext(conf)   
val stringRdd = sc.parallelize(Seq("Test String"))
stringRdd.saveAsTextFile("out\\string\\test")

edited Aug 26 '16 at 19:29

answered Aug 26 '16 at 19:16

Ronak Patel

3,819
1
16
29

Thanks. That did work. I am editing my question with a follow-up if you can help. – kruparulz14 Aug 26 '16 at 19:42
1

Once you accept an answer, it's better to post new question. – Ravindra babu Aug 27 '16 at 14:04

score 6 · Answer 2 · edited May 23 '17 at 11:46

Follow up Example: (Tested as below)

val conf = new SparkConf().setAppName("Total Countries having Icon").setMaster("local")
val sc = new SparkContext(conf)

val headerRDD= sc.parallelize(Seq("HEADER"))

//Replace BODY part with your DF
val bodyRDD= sc.parallelize(Seq("BODY"))

val footerRDD = sc.parallelize(Seq("FOOTER"))

//combine all rdds to final    
val finalRDD = headerRDD ++ bodyRDD ++ footerRDD 

//finalRDD.foreach(line => println(line))

//output to one file
finalRDD.coalesce(1, true).saveAsTextFile("test")

output:

HEADER
BODY
FOOTER

more examples here. . .

Writing to a file in Apache Spark

2 Answers2

Linked