Write first 5 rows into hdfs file through spark

Question

I want to write first 5 lines to a hdfs file through spark code

sc.textFile("hdfs://localhost:8020/user/hadoop/data-master/retail_db/products/part-00000").map( rec => ( rec.split(",")(4).toDouble, rec )).sortByKey(false).map(_._2)

Here we can use saveAsTextFile API, But it's an action while we need to limit rows through transformations.

I'm suspecting, that this has already been answered here: https://stackoverflow.com/questions/34206508/is-there-a-way-to-take-the-first-1000-rows-of-a-spark-dataframe — Rick Moritz, Jun 27 '17 at 12:39
Possible duplicate of [Is there a way to take the first 1000 rows of a Spark Dataframe?](https://stackoverflow.com/questions/34206508/is-there-a-way-to-take-the-first-1000-rows-of-a-spark-dataframe) — Rick Moritz, Jun 27 '17 at 12:40

score 0 · Answer 1 · answered Jun 27 '17 at 12:42

You can use limit function to get selected n first rows

def limit(n: Int): Dataset[T]

Returns a new Dataset by taking the first n rows. The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset.

yourDF.limit(5)  // takes the first 5 rows

If you want to take first 5 rows as an array then you can use take function

yourDF.take(5)

Hope this helps!

Write first 5 rows into hdfs file through spark

1 Answers1