6

I have a dataframe with a list of IDs. I would like to filter it down to just a set of IDs and I used .filter() to do it.

I'm running into this error.

java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.HashSet$HashTrieSet

My code is pretty simple.

val setofID = Set("112", "113", "114", "121", "118", "120")

val my_dfFiltered = my_df.filter($"id".isin(setofID)).persist
Cauder
  • 2,157
  • 4
  • 30
  • 69
  • 2
    See this answer: https://stackoverflow.com/a/32560177/2639647. `.isin()` takes a variable list of params, not a single iterable. `.isin(setofID:_*)` might work. – Travis Hegner Aug 22 '19 at 18:04

1 Answers1

10

Set is not working with isin, use a Seq and use varags like

val setofID = Set("112", "113", "114", "121", "118", "120").toSeq

val my_dfFiltered = my_df.filter($"id".isin(setofID:_*)).persist

or using isInCollection (since Spark 2.4) which accepts Iterable, this should work directly with Set

val my_dfFiltered = my_df.filter($"id".isInCollection(setofID)).persist
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145