0

I want to write first 5 lines to a hdfs file through spark code

sc.textFile("hdfs://localhost:8020/user/hadoop/data-master/retail_db/products/part-00000").map( rec => ( rec.split(",")(4).toDouble, rec )).sortByKey(false).map(_._2)

Here we can use saveAsTextFile API, But it's an action while we need to limit rows through transformations.

J_V
  • 357
  • 3
  • 6
  • I'm suspecting, that this has already been answered here: https://stackoverflow.com/questions/34206508/is-there-a-way-to-take-the-first-1000-rows-of-a-spark-dataframe – Rick Moritz Jun 27 '17 at 12:39
  • 2
    Possible duplicate of [Is there a way to take the first 1000 rows of a Spark Dataframe?](https://stackoverflow.com/questions/34206508/is-there-a-way-to-take-the-first-1000-rows-of-a-spark-dataframe) – Rick Moritz Jun 27 '17 at 12:40

1 Answers1

0

You can use limit function to get selected n first rows

def limit(n: Int): Dataset[T]

Returns a new Dataset by taking the first n rows. The difference between this function and head is that head is an action and returns an array (by triggering query execution) while limit returns a new Dataset.

yourDF.limit(5)  // takes the first 5 rows

If you want to take first 5 rows as an array then you can use take function

yourDF.take(5)

Hope this helps!

koiralo
  • 22,594
  • 6
  • 51
  • 72