SPARK-Version: 1.5.2 with yarn 2.7.1.2.3.0.0-2557
I'm running into a problem while I'm exploring the data through spark-shell that I'm trying to create a really fat dataframe that with 3000 columns. Code as below:
val valueFunctionUDF = udf((valMap: Map[String, String], dataItemId: String) =>
valMap.get(dataItemId) match {
case Some(v) => v.toDouble
case None => Double.NaN
})
s1 is being the main dataframe and the schema as below:
|-- combKey: string (nullable = true)
|-- valMaps: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
after I run the code:
dataItemIdVals.foreach{w =>
s1 = s1.withColumn(w, valueFunctionUDF($"valMaps", $"combKey"))}
my terminal just stuck after the above column with the info being printed out:
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 172.22.49.20:41494 in memory (size: 7.6 KB, free: 5.2 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:43026 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:44890 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:52020 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:33272 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:48481 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:44026 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:34539 in memory (size: 7.6 KB, free: 5.0 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:43734 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:42769 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:60603 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:59102 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:47578 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:43149 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:52488 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:52298 in memory (size: 7.6 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 9
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 172.22.49.20:41494 in memory (size: 7.3 KB, free: 5.2 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:33272 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:59102 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:44026 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:42769 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:43149 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:43026 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:52298 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:42890 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:47578 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:60603 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:43734 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:48481 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:52020 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:52488 in memory (size: 7.3 KB, free: 5.1 GB)
16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:34539 in memory (size: 7.3 KB, free: 5.0 GB)
16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 8
16/07/11 12:20:54 INFO ContextCleaner: Cleaned shuffle 0
16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 7
16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 6
16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 5
16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 4
Nothing is going on on sparkUI and I guess spark is calculating some metadata for the new dataframe(number of column etc.)? Anyone seen this kind of issue before? Anyway to get around with it?