0

I'm new to Spark Scala. I would really appreciate if someone could help me here. I have a dataframe called df.

df.printSchema()
root
 |-- tab: string (nullable = true)
 |-- cust: string (nullable = true)
 |-- date: string (nullable = true)
 |-- uniqIds: string (nullable = true)

df.show()
+-------+----+----------+--------------------+
|    tab|cust|      date|             uniqIds|
+-------+----+----------+--------------------+
|t_users| abc|2018050918|[123, 1234, 22123]  |
|t_users| def|2018050918|[1sdf23, 12f34]     |
+-------+----+----------+--------------------+

Now I want to loop through each record and do some processing, that is kickoff another function/process based off of first 3 columns. If that process is success. Then I want to store all the values from uniqIds column into a df. Once I have all the uniqIds for the success process, I will write them to a file.

var uq = Seq((lit(e))).toDF("unique_id)
df.foreach { row => 
val uniqIds: Array[String] = row(3).toString.replace("[", "").replace("]", "").replace(" ","").split(",")
uniqIds.foreach { e=> 
var df2 = Seq((lit(e))).toDF("unique_id)
uq.union(df2)
}

But when I try doing that, I get an error message

ERROR Executor:91 - Exception in task 1.0 in stage 11.0 (TID 23) java.lang.NullPointerException

Does anyone have the same problem ? How can I overcome this problem. Thanks in advance.

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
Ram
  • 159
  • 1
  • 10
  • 3
    see https://stackoverflow.com/a/47358940/5344058 – Tzach Zohar Jul 23 '18 at 19:59
  • Thanks @TzachZohar. I had to use a `collect` in the foreach loop to make sure that my data is evaluated before I go inside the loop. Since my dataframe is smaller in size and I only had to process one record at a time and not process the records as batch. I had to resort to using a collect. `var uq = Seq((lit(e))).toDF("unique_id)` `df.collect().foreach { row =>` `val uniqIds: Array[String] = row(3).toString.replace("[", "").replace("]", "").replace(" ","").split(",")` `uniqIds.foreach { e=> ` `var df2 = Seq((lit(e))).toDF("unique_id)` `uq = uq.union(df2)` `}` – Ram Jul 24 '18 at 15:28

0 Answers0