I am learning Spark source code, and get confused on the following code:
/**
* Return a new RDD containing the distinct elements in this RDD.
*/
def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] =
map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1)
What is the input data for the map(x => (x, null)) function? Why and when the input can be omitted?
UPDATE:
Here is the link to the source code.