create dataframe in foreach loop apache spark

Question

I'm new to Spark Scala. I would really appreciate if someone could help me here. I have a dataframe called df.

df.printSchema()
root
 |-- tab: string (nullable = true)
 |-- cust: string (nullable = true)
 |-- date: string (nullable = true)
 |-- uniqIds: string (nullable = true)

df.show()
+-------+----+----------+--------------------+
|    tab|cust|      date|             uniqIds|
+-------+----+----------+--------------------+
|t_users| abc|2018050918|[123, 1234, 22123]  |
|t_users| def|2018050918|[1sdf23, 12f34]     |
+-------+----+----------+--------------------+

Now I want to loop through each record and do some processing, that is kickoff another function/process based off of first 3 columns. If that process is success. Then I want to store all the values from uniqIds column into a df. Once I have all the uniqIds for the success process, I will write them to a file.

var uq = Seq((lit(e))).toDF("unique_id)
df.foreach { row => 
val uniqIds: Array[String] = row(3).toString.replace("[", "").replace("]", "").replace(" ","").split(",")
uniqIds.foreach { e=> 
var df2 = Seq((lit(e))).toDF("unique_id)
uq.union(df2)
}

But when I try doing that, I get an error message

ERROR Executor:91 - Exception in task 1.0 in stage 11.0 (TID 23) java.lang.NullPointerException

Does anyone have the same problem ? How can I overcome this problem. Thanks in advance.

Thanks @TzachZohar. I had to use a `collect` in the foreach loop to make sure that my data is evaluated before I go inside the loop. Since my dataframe is smaller in size and I only had to process one record at a time and not process the records as batch. I had to resort to using a collect. `var uq = Seq((lit(e))).toDF("unique_id)` `df.collect().foreach { row =>` `val uniqIds: Array[String] = row(3).toString.replace("[", "").replace("]", "").replace(" ","").split(",")` `uniqIds.foreach { e=> ` `var df2 = Seq((lit(e))).toDF("unique_id)` `uq = uq.union(df2)` `}` — Ram, Jul 24 '18 at 15:28

create dataframe in foreach loop apache spark

0 Answers0