-1

I'm facing an issue with manipulating a WrappedArray column. I want to remove/filter element from the WrappedArray column in a Spark dataset. The WrappedArray contain objects, for example, I have a dataset contain following column:

ColA
-----
WrappedArray([id:111, type:A],[id:222,type:B])
WrappedArray([id:333, type:A],[id:444,type:C])
WrappedArray([id:555, type:B],[id:666,type:C])

I want to remove any element inside the WrappedArray with type == A. The desired output is like:

ColA
-----
WrappedArray([id:222,type:B])
WrappedArray([id:444,type:C])
WrappedArray([id:555, type:B],[id:666,type:C])

I was thinking about using an UDF and withColumn, and I can see that the WrappedArray API has the filter function, but can't get the syntax right.

Working on Java, but any language is okay. Any help/suggestion would be nice!

zero323
  • 322,348
  • 103
  • 959
  • 935
Alex
  • 57
  • 1
  • 5
  • Did you check this ? https://stackoverflow.com/questions/48195507/how-to-get-data-out-of-wrapped-array-in-apache-spark-scala – Vinod Chandak Mar 29 '18 at 05:57
  • just read it. That solution seems won't work if the position is not fixed if I understand correctly? – Alex Mar 29 '18 at 06:09
  • Yes, that question is a bit different. For this I would probably be able to give you a Scala solution using an `UDF`, but not so sure on how helpful it would be. Could you add the `UDF` you have tried to the question? – Shaido Mar 29 '18 at 06:14

1 Answers1

0

Solved by using explode, Basic idea is to explode to the element level, and filter out where colA.type=A

Alex
  • 57
  • 1
  • 5