I've been searching through stackoverflow for several days now and I'm just not finding an answer to the following question. I'm really new to scala coding, so this might be a very basic question. Any help will be much appreciated.
The problem I'm having (getting an error on) is with the last bit of code.
I'm trying to get a filtered subset of records from a dataframe where all the filtered records are missing data from one or more of the specified fields.
I'm using Scala IDE Build 4.7.0 in Eclipse.
The pom.xml file I'm using has spark-core_2.11, version 2.0.0
Thank you.
Jesse
val source_path = args(0)
val source_file = args(1)
val vFile = sc.textFile(source_path + "/" + source_file)
val vSchema = StructType(
StructField("FIELD_1",LongType,false)::
StructField("FIELD_2",LongType,false)::
StructField("FIELD_3",StringType,true)::
StructField("FIELD_4",StringType,false)::
StructField("FIELD_ADD_1",StringType,false)::
StructField("FIELD_ADD_2",StringType,false)::
StructField("FIELD_ADD_3",StringType,false)::
StructField("FIELD_ADD_4",StringType,false)::
StructField("FIELD_5",StringType,false)::
StructField("FIELD_6",StringType,false)::
StructField("FIELD_7",StringType,false)::
StructField("FIELD_8",StringType,false)::
Nil)
// val vRow = vFile.map(x=>x.split((char)30, -1)).map(x=> Row(
val vRow = vFile.map(x=>x.split("", -1)).map(x=> Row(
x(1).toLong,
x(2).toLong,
x(3).toString.trim(),
x(4).toString.trim(),
x(5).toString.trim(),
x(6).toString.trim(),
x(7).toString.trim(),
x(8).toString.trim(),
x(9).toString.trim(),
x(10).toString.trim(),
x(11).toString.trim(),
x(12).toString.trim()
))
val dfData = sqlContext.createDataFrame(vRow.distinct(),vSchema)
val dfBlankRecords = dfData.filter(x => (
x.trim(col("FIELD_ADD_1")) == "" ||
x.trim(col("FIELD_ADD_2")) == "" ||
x.trim(col("FIELD_ADD_3")) == "" ||
x.trim(col("FIELD_ADD_4")) == ""
))