A sample data:
+--------------------+-----+--------+----+----------+--------+-----------+
| id|click| hour| C1|banner_pos| site_id|site_domain|
+--------------------+-----+--------+----+----------+--------+-----------+
| NULL | 0|14102100|1005| 0|1fbe01fe| f3845767|
|10000169349117863715| 0|14102100|1005| 0|1fbe01fe| f3845767|
|10000371904215119486| 0|NULL |1005| 0|1fbe01fe| f3845767|
|10000640724480838376| 0|14102100|1005| 0|1fbe01fe| f3845767|
|10000679056417042096| 0|14102100|1005| 1|fe8cc448| 9166c161|
+--------------------+-----+--------+----+----------+--------+-----------+
Expected result:
+--------------------+-----+--------+----+----------+--------+-----------+
| id|click| hour| C1|banner_pos| site_id|site_domain|
+--------------------+-----+--------+----+----------+--------+-----------+
| NULL | 0|14102100|1005| 0|1fbe01fe| f3845767|
|10000371904215119486| 0|NULL |1005| 0|1fbe01fe| f3845767|
+--------------------+-----+--------+----+----------+--------+-----------+
I want to filter all columns in a Dataframe with the condition that at least one column contains "NULL" string. I have many columns, around 30. I have seen that with a few columns conditions can be added like this:
df2 = df1.filter($"Status" === 2 || $"Status" === 3)
But since I have many columns with the same conditions, is there a way to treat the columns
as a collection and filter them all together?
I have tried the following but it doesn't seem to work:
df2.filter(
lit(
df2.columns.map(col(_).contains("NULL")).contains(lit(true))
)
).show()
It seems df2.columns.map(col(_).contains("NULL")).contains(lit(true))
always returns false
.
Why is it like this? Could you explain why this doesn't work?
The following codes work for the above purpose.
df.filter(
lit(true).isin(df.columns.map(col(_).contains("NULL")): _*)
).show()