2

I have working with graphframes and now I am using aggregate Message. The vertex schema is:

 |-- id: long (nullable = false)
 |-- company: string (nullable = true)
 |-- money: integer (nullable = false)
 |-- memoryLearned: map (nullable = true)
 |    |-- key: string
 |    |-- value: integer (valueContainsNull = false)

If I try it:

  ...
 def createMessage(memory: org.apache.spark.sql.Column): org.apache.spark.sql.Column = {
    memory + 10
  }

...

val msgToSrc: org.apache.spark.sql.Column = this.createMessage(AM.dst("id"))

val aggregates = gx
        .aggregateMessages
        .sendToSrc(msgToSrc)
        .agg(sum(AM.msg).as("aggMess"))
aggregates.show()

It works!, but I need to get keys and values from memoryLearned, so I think it works:

...
     def createMessage(memory: org.apache.spark.sql.Column): org.apache.spark.sql.Column = {
        for((k,v) <- memory)
           ...
      }


...

val msgToSrc: org.apache.spark.sql.Column = this.createMessage(AM.dst("memoryLearned"))

val aggregates = gx
        .aggregateMessages
        .sendToSrc(msgToSrc)
        .agg(myUDFA(AM.msg).as("aggMess"))
aggregates.show()

I got this error: "value filter is not a member of org.apache.spark.sql.Column"

I tried to search how to cast or get MapType, but i only find functions like explode using the dataframe, but I have not a df, I only have one column...

If I put this: memory.getItem("aKeyFromMap") instead of for(..., I get the correct value from Map...

Also I have tried to create "aux" DataFrame into createMessage (one row and one column) for using df functions, but when I use .withColumn("newColumn",memory), it fails..

I am blocked.. any ideas?

A lot of thanks!! Regards

SCouto
  • 7,808
  • 5
  • 32
  • 49

1 Answers1

1

If you want to iterate over MapType Column, and you don't know the key up front, you have to use UDF or other operation on the external type (like map):

import org.apache.spark.sql.functions.udf

def createMessage = udf( (memory: Map[String, Integer]) => {
  for( (k,v) <- memory )
  ...
} )

You get:

I got this error: "value filter is not a member of org.apache.spark.sql.Column"

because for comprehensions are syntactic sugar for map / flatMap / filter.

philantrovert
  • 9,904
  • 3
  • 37
  • 61
Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
  • 1
    Thanks!!!, it works: def createMessage = udf((memory: Map[String, Integer]) => { var suma = 0 if (memory != null) { for ((k, v) <- memory) suma = suma + v } suma }) – David González Jan 31 '18 at 12:27
  • Great :) Would you consider [voting up](https://stackoverflow.com/help/why-vote)? Thanks in advance :) – Alper t. Turker Jan 31 '18 at 13:39