I am facing a problem in PySpark Dataframe loaded from a CSV file , where my numeric column do have empty values Like below
+-------------+------------+-----------+-----------+
| Player_Name|Test_Matches|ODI_Matches|T20_Matches|
+-------------+------------+-----------+-----------+
| Aaron, V R| 9| 9| |
| Abid Ali, S| 29| 5| |
|Adhikari, H R| 21| | |
| Agarkar, A B| 26| 191| 4|
+-------------+------------+-----------+-----------+
Casted those columns to integer and all those empty become null
df_data_csv_casted = df_data_csv.select(df_data_csv['Country'],df_data_csv['Player_Name'], df_data_csv['Test_Matches'].cast(IntegerType()).alias("Test_Matches"), df_data_csv['ODI_Matches'].cast(IntegerType()).alias("ODI_Matches"), df_data_csv['T20_Matches'].cast(IntegerType()).alias("T20_Matches"))
+-------------+------------+-----------+-----------+
| Player_Name|Test_Matches|ODI_Matches|T20_Matches|
+-------------+------------+-----------+-----------+
| Aaron, V R| 9| 9| null|
| Abid Ali, S| 29| 5| null|
|Adhikari, H R| 21| null| null|
| Agarkar, A B| 26| 191| 4|
+-------------+------------+-----------+-----------+
Then I am taking a total , but if one of them is null , result is also coming as null. How to solve it ?
df_data_csv_withTotalCol=df_data_csv_casted.withColumn('Total_Matches',(df_data_csv_casted['Test_Matches']+df_data_csv_casted['ODI_Matches']+df_data_csv_casted['T20_Matches']))
+-------------+------------+-----------+-----------+-------------+
|Player_Name |Test_Matches|ODI_Matches|T20_Matches|Total_Matches|
+-------------+------------+-----------+-----------+-------------+
| Aaron, V R | 9| 9| null| null|
|Abid Ali, S | 29| 5| null| null|
|Adhikari, H R| 21| null| null| null|
|Agarkar, A B | 26| 191| 4| 221|
+-------------+------------+-----------+-----------+-------------+