I've got a hadoopFiles object which is generated from sc.newAPIHadoopFile
.
scala> hadoopFiles
res1: org.apache.spark.rdd.RDD[(org.apache.hadoop.io.LongWritable, org.apache.hadoop.io.Text)] = UnionRDD[64] at union at <console>:24
I intend to iterate through all the lines in hadoopFiles with operation and filter on it, In which, a if
check is applied and will throw an exception:
scala> val rowRDD = hadoopFiles.map(line =>
| line._2.toString.split("\\^") map {
| field => {
| var pair = field.split("=", 2)
| if(pair.length == 2)
| (pair(0) -> pair(1))
| }
| } toMap
| ).map(kvs => Row(kvs("uuid"), kvs("ip"), kvs("plt").trim))
<console>:33: error: Cannot prove that Any <:< (T, U).
} toMap
^
However, if I remove the if(pair.length == 2)
part, it will works fine:
scala> val rowRDD = hadoopFiles.map(line =>
| line._2.toString.split("\\^") map {
| field => {
| var pair = field.split("=", 2)
| (pair(0) -> pair(1))
| }
| } toMap
| ).map(kvs => Row(kvs("uuid"), kvs("ip"), kvs("plt").trim))
warning: there was one feature warning; re-run with -feature for details
rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.catalyst.expressions.Row] = MappedRDD[66] at map at <console>:33
Could anyone tell me the reason for this phenomenon, and show me the correct way to apply the if
statement. Thanks a lot!
P.S. We could use this simplified example to test:
"1=a^2=b^3".split("\\^") map {
field => {
var pair = field.split("=", 2)
if(pair.length == 2)
pair(0) -> pair(1)
else
return
}
} toMap