0

In a glue script (running in a zeppelin notebook forwarding to a dev endpoint in glue), I've created a dynamicframe from a glue table, that I would like to filter on field "name" not being in a static list of values, i.e. ("a","b","c").

Filtering on non-equality works just fine like this:

def unknownNameFilter(rec: DynamicRecord): Boolean = { 
   rec.getField("name").exists(_ != "a")
}

I have tried several things like

!rec.getField("name").exists(_ isin ("a","b","c"))

but it gives errors (value isin is not a member of Any), and I can only find pyspark examples and examples that first convert the dynamicframe to a dataframe on the web (which I want to prevent if possible).

Help much appreciated, thanks.

Anske
  • 1
  • 3

1 Answers1

0

Okay, found my answer, I'll post it for anyone else looking for this, it is done with

!(knownevents.contains(eventname))

Like this in a filter function:

def unknownEventFilter(rec: DynamicRecord): Boolean = { 
  
  val knownevents = List("evt_a","evt_b")    
     
  rec.getField("name") match {
 
    case Some(eventname: String) => !(knownevents.contains(eventname))
      
    case _ => throw new IllegalArgumentException(s"Unable to extract field name")
  }
}

val dfUnknownEvents =  df.filter(unknownEventFilter) 
Anske
  • 1
  • 3