Omit input data of map function in Scala

Question

I am learning Spark source code, and get confused on the following code:

/**
 * Return a new RDD containing the distinct elements in this RDD.
 */
def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] =
  map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)

What is the input data for the map(x => (x, null)) function? Why and when the input can be omitted?

UPDATE:

Here is the link to the source code.

Hi @Daenyth Thanks for the reminder, I've added the link to the source code. — Shen Li, Jun 09 '15 at 16:57

DNA · Accepted Answer · 2015-06-09T17:54:30.050

distinct and map are both methods on the RDD class (source), so distinct is just calling another method on the same RDD.

The map function is a higher-order function - i.e. it accepts a function as one of its parameters (f: T => U)

/**
 * Return a new RDD by applying a function to all elements of this RDD.
 */
def map[U: ClassTag](f: T => U): RDD[U] = withScope {
  val cleanF = sc.clean(f)
  new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.map(cleanF))
}

In the case of distinct, the parameter f to map is the anonymous function x => (x, null).

Here's a simple example of using an anonymous function (lambda), in the Scala REPL (using the similar map function on a Scala list, not a Spark RDD):

scala> List(1,2,3).map(x => x + 1)
res0: List[Int] = List(2, 3, 4)

score 1 · Answer 2 · answered Jun 09 '15 at 17:00

1

the map function map(x => (x, null)) is the map defined by the class

I don't understand your question about omitting the input. You can't call a function in scala that expects an argument without giving it the argument.

answered Jun 09 '15 at 17:00

Daenyth

35,856
13
85
124

Omit input data of map function in Scala

2 Answers2