1

I have an rdd like this:

val custFile = sc.textFile("custInfo.txt").map(line => line.split('|'))

val custPrd = custFile.map(a => (a(0), ((a(1)), (a(2), a(3), a(4), a(5), a(6), a(7), a(8)))))

val custGrp = custPrd.groupByKey

custGrp.saveAsTextFile("custinfo2")

that produces this:

(1104,CompactBuffer((S_SAVG,(1,1,1,1,1,1,1)), (CN_SAVG,(4,4,1,1,4,1,1))))

how can I use something like this:

custPrdGrp.map{case (k, vals) => {val valsString = vals.mkString(", "); s"{$k:, {$valsString}}" }}

to format a (k, (v, w)) pair...I tried this but got an error:

val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
<console>:27: error: constructor cannot be instantiated to expected type;
 found   : (T1, T2)
 required: Iterable[(String, (String, String, String, String, String, String, String))]
       val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
                                                 ^ 


<console>:27: error: not found: value v
           val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})
                                                                                  ^
    <console>:27: error: not found: value w
           val custPrdRep = custPrdGrp.map({case (k, (v, w)) => {val valsString = v.mkString(", "); val valsPrvcy = w.mkString(", "); s"'${k}'| [$valsString]" }})

I'd want the array to look like this:

('1104'|{'S_SAVG': {a: '1', b: '1', c: '1', d: '1', e: '1', f: '1', g: '1'}, 'CN_SAVG': {a: '4', b: '4', c: '1', d: '1', e: '4', f: '1', g: '1'}})
zero323
  • 322,348
  • 103
  • 959
  • 935
lightweight
  • 3,227
  • 14
  • 79
  • 142

1 Answers1

3

Well, there is quite a lot of details here but something like this should work:

val keys = List("a", "b", "c", "d", "e", "f", "g")

custGrp.map{case (k, vals) => {
    val valsString = vals map {
        case (val1, val2) => {
            val pairs = keys
                // Create someLetter: 'someNumber' pairs
                .zip(val2.productIterator.map{case (x: String)  => x}.toSeq)
                .map{case (k, v) => s"$k: '$v'"}
                // Join into a single string
                .mkString(", ")
            // Add "key"
            s"'$val1': {$pairs}"
        }
    }
    // Combine above
    val valsComb = valsString.mkString(", ")
    // Create final string
    s"('$k'|{$valsComb})"
}}

You could simplify things by creating a correct data structure in the first place. For example by using Maps instead of tuples:

 Map("S_SAVG" -> Map("a" -> "1", "b" -> "1", ...), ...)
zero323
  • 322,348
  • 103
  • 959
  • 935
  • can you help me understand some things...I'm new to scala and spark so want to understand some more of this...for instance...what does `map{case (k, vals) =>` exatcly do? What is it mapping to? also, what is `.zip(val2.productIterator.map{case (x: String) => x}.toSeq)` doing? and `.mkString`. Sorry if the questions are basic but I'm really trying to understand this... – lightweight Aug 19 '15 at 16:29
  • 1
    Oh wow. 1. _what does map{case (k, vals) => exatcly do? _It is a pattern matching anonymous function. See: http://stackoverflow.com/a/30879186/1560062 (I won't mind an upvote there :D) 2. _what is .zip(val2.productIterator ... doing_ Well, to understand that you have to dive deeper into the [Product Classes](http://stackoverflow.com/q/1301907/1560062). Here it is used as _a very naive way_ to convert to tuple to collection, 3. _and .mkString_ It simply calls `toString` on each element of the input collection and concatenated results with provided separator. – zero323 Aug 19 '15 at 16:47
  • also, can you help me understand this part and where it goes? `Map("S_SAVG" -> Map("a" -> "1", "b" -> "1", ...), ...)` – lightweight Aug 19 '15 at 18:58
  • 2
    Well you could map like this `val custPrd = custFile.map(a => (a(0), ((a(1)), Map("a" -> a(2), "b" -> a(3), ..., "g" -> a(8)))))` and adjust downstream code. Then you can omit `productIterator` and names are already in place. – zero323 Aug 19 '15 at 19:06
  • also, I tried your above answer but for some reason got the same array results back...and whats the best place to learn this stuff as in going through example...I find very basic tutorials but can't find more complex ones... – lightweight Aug 19 '15 at 19:10
  • @zero323....I'm trying to get the 2nd map format but unable to get it to work...made some edits to the question on what've tried...I think i'm on the right track? – lightweight Aug 20 '15 at 14:45
  • If your requirements change please ask another question and don't rewrite the current one. It renders existing answers useless. Also, try to isolate the problem. If you want to convert map to string all other stuff is irrelevant. Finally __please provide example data__. Something that can be simply copied and pasted to test and has well defined type. – zero323 Aug 20 '15 at 14:55
  • posted a related question here with sample data...http://stackoverflow.com/questions/32123418/formatting-a-nested-map-in-a-map-in-spark-rdd – lightweight Aug 20 '15 at 16:21
  • OK, thanks for accepting the answer. I hope you don't mind I've rollbacked your question to the version that matches it. – zero323 Aug 20 '15 at 16:27