-1

I want to get 2 rdd from Cassandra,then join them.And I want to skip the empty value.

def extractPair(rdd: RDD[CassandraRow]) = {
    rdd.map((row: CassandraRow) => {

     val name = row.getName("name")
     if (name == "")
         None   //join wrong
     else
        (name, row.getUUID("object"))

    })
  }

  val rdd1 = extractPair(cassRdd1)
  val rdd2 = extractPair(cassRdd2)
  val joinRdd = rdd1.join(rdd2)  //"None" join wrong

use flatMap can fix this,but i want to know how to use map fix this

def extractPair(rdd: RDD[CassandraRow]) = {
        rdd.flatMap((row: CassandraRow) => {

         val name = row.getName("name")
         if (name == "")
             seq()
         else
            Seq((name, row.getUUID("object")))

        })
      }
xidianw3
  • 1
  • 1

1 Answers1

0

This isn't possible with just a map. You would need to follow it up with a filter. But you would still be best to wrap the valid result in a Some. But, then you would still have it wrapped in a Some as a result...requiring a second map to unwrap it. So, realistically, your best option is something like this:

def extractPair(rdd: RDD[CassandraRow]) = {
  rdd.flatMap((row: CassandraRow) => {
    val name = row.getName("name")
    if (name == "") None
    else Some((name, row.getUUID("object")))
  })
}

Option is implicitly convertable to a flattenable type and conveys your methods message better.

Justin Pihony
  • 66,056
  • 18
  • 147
  • 180
  • this is woring. Value join is not a member of org.apache.spark.rdd.RDD[Some[(Any, java.util.UUID)]] – xidianw3 Jun 24 '15 at 05:43
  • Are you using `flatMap`? It should strip away the `Some` – Justin Pihony Jun 24 '15 at 05:45
  • What i have given flatMap Code could work,but i what to know how to use map.Because i think map is efficient than flatMap. – xidianw3 Jun 24 '15 at 05:57
  • And what makes you think that map is more efficient? I addressed your map question in the first section of my answer...did you even read that? Or just try to copy the code? – Justin Pihony Jun 24 '15 at 06:00
  • yeah,i have read your advice,but i do not understand. Map is used to input one output one, FlatMap is used to input one output many.So i think Map is more efficient. – xidianw3 Jun 24 '15 at 07:52
  • No, as you know flatmap is one to zero or more. You cannot do what you want in map alone. Thus why flatMap is the best option – Justin Pihony Jun 24 '15 at 15:47