1

Here is my RDD:

scala> grouped_final_resultMap.first
res20: (String, Iterable[Float]) = (2014-02-01,CompactBuffer(239.96, 129.99, 49.98, 100.0, 399.98))

What I want to do is the sum up all the items in the Iterable[Float] in that RDD’s _2-nd component.

Can anyone tell me how can I do it?

Thank you very much.


Update:

Here is the repl session:

scala> final_result.take(20).foreach(println)
2828,2013-08-10,129.99
43399,2014-04-20,100.0
43399,2014-04-20,129.99
43399,2014-04-20,49.98
8989,2013-09-19,119.97

...

scala> val final_resultMap1 = final_result.map(x=>(x.split(",")(1), x.split(",")(2).toFloat))
scala> grouped_final_resultMap1.first
res34: (String, Iterable[Float]) = (2014-02-01,CompactBuffer(239.96, 129.99, ...

val sumed = final_resultMap1.map{case (str, nums) => (str, nums.sum)}

If gives the following error:

<console>:41: error: value sum is not a member of Float
   val sumed = final_resultMap1.map{case (str, nums) => (str, nums.sum)}

Thank you.

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
Choix
  • 555
  • 1
  • 12
  • 28

1 Answers1

1

Since CompactBuffer extends Iterable, and Iterable has a method sum, it should be as simple as:

grouped_final_resultMap.map{ 
  case (str, nums) => (str, nums.sum) 
}

However, make sure that you didn't miss the opportunity to perform this associative reduction operation in some previous step.


From your EDIT it is apparent that you don't want to do anything at all with the grouped_final_resultMap1, and instead you want something like

final_resultMap1.reduceByKey((_: Float) + (_: Float))

or just

final_resultMap1.reduceByKey(_ + _) 

for short.

Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
  • Thank you for the quick reply: scala> val sumed = grouped_final_resultMap1.map{case (str, nums) => (str, nums.sum)} :43: error: could not find implicit value for parameter num: Numeric[String] val sumed = grouped_final_resultMap1.map{case (str, nums) => (str, nums.sum)} ??? – Choix Feb 11 '18 at 14:53
  • Is there any difference between "grouped_final_resultMap" and "grouped_final_resultMap1"? Did you swap the order of strings and compactBuffers or something like that? – Andrey Tyukin Feb 11 '18 at 14:56
  • same. I also just tried successfully make the numbers to toFloat so that they are float, and do the map again:scala> val sumed = final_resultMap1.map{case (str, nums) => (str, nums.sum)} :41: error: value sum is not a member of Float val sumed = final_resultMap1.map{case (str, nums) => (str, nums.sum)} – Choix Feb 11 '18 at 15:02
  • can you start a chat room maybe? – Choix Feb 11 '18 at 15:03
  • And `final_resultMap1` is again an `RDD[(String, Iterable[Float])]`? (it will give me a link to a chatroom if we post a few more comments here) – Andrey Tyukin Feb 11 '18 at 15:05
  • yes:scala> final_resultMap1 res33: org.apache.spark.rdd.RDD[(String, Float)] = MapPartitionsRDD[20] at map at :39 – Choix Feb 11 '18 at 15:06
  • I don't see how to start a new chatroom, I guess my account doesn't have the privilege yet – Choix Feb 11 '18 at 15:07
  • Well, an `RDD[(String, Float)]` is obviously not the same as an `RDD[(String, Iterable[Float])]`, so how does this fit together with your question, and in particular with the title of the question? Where did the `CompactBuffer`s go? – Andrey Tyukin Feb 11 '18 at 15:08
  • I think it would be better and more convenient to post the code in a chatroom, can you do so, I do not have enough points to do it – Choix Feb 11 '18 at 15:11
  • Here it is: https://chat.stackoverflow.com/rooms/164922/reduce-compact-buffers-choix-tyukin – Andrey Tyukin Feb 11 '18 at 15:13
  • alright, I think I should start from beginning and apply your idea when needed, I will post update here, thank you. – Choix Feb 11 '18 at 15:13
  • It turns out I do not have enough point to even chat, anyway, thank you and like I said, I will post my update here. – Choix Feb 11 '18 at 15:15
  • Yeah, all those apparent limitations have a reason. It's a Q&A site, the idea is that you try to make your question sufficiently high-quality so that it is useful for everyone else encountering a similar problem. Try to formulate your problem more precisely, then update the question. And please markup code-blocks in the question. – Andrey Tyukin Feb 11 '18 at 15:19
  • start again: scala> val final_resultMap1 = final_result.map(x=>(x.split(",")(1), x.split(",")(2).toFloat)) scala> grouped_final_resultMap1.first res34: (String, Iterable[Float]) = (2014-02-01,CompactBuffer(239.96, 129.99, ... val sumed = final_resultMap1.map{case (str, nums) => (str, nums.sum)} :41: error: value sum is not a member of Float val sumed = final_resultMap1.map{case (str, nums) => (str, nums.sum)} – Choix Feb 11 '18 at 15:20
  • Again: `grouped_final_resultMap1` seems to have nothing in common with `final_resultMap1`, it has an entirely different type, and it obviously cannot work this way. Please don't dump walls of code in the comment section, and update your question instead (there should be a small gray 'edit' button right under the question). – Andrey Tyukin Feb 11 '18 at 15:25
  • Thank you, the initial question was updated. as you can see, the data is (String, Iterable[Float]) and CompactBuffer with presumably Float – Choix Feb 11 '18 at 15:33
  • In your update, you don't show where "grouped_final_resultMap1" comes from, you simply extract the first value from it, and print it out. It doesn't seem to have anything in common with the `final_resultMap1`, on which you are calling the `map` function provided in my answer. I updated my answer with a `reduceByKey` operation which is applicable to `final_resultMap1`. – Andrey Tyukin Feb 11 '18 at 15:37
  • Thank you very much, that's the answer! – Choix Feb 11 '18 at 15:38
  • @Choix I've updated the formatting of your question a little bit, you might want to click on the "edited – Andrey Tyukin Feb 11 '18 at 15:44