1

I have a simple function that takes some XML in a field, parses the values, and returns a list:

<data>
   <datas a="1" b="2" c="3">
   <datas a="2" b="3" c="2">
</data>

becomes a nested list [[1,2,3],[2,3,2]]

I've made this a udf, and I'm making this call on my dataframe:

myudf=udf(myparser)
df2=df1.withColumn("newDataColumn",myudf(df1["xmldatafield"]))

this works. Except that newDataColumn is type STRING instead of Array. So I can't use any of the sql Array functions on it to access or work with individual elements.

I've confirmed in python that the function is returning a List type.

Any idea what I'm doing wrong or how I could get this to be an array column type?

Ian Murphy
  • 21
  • 2

1 Answers1

1

A friend of mine just told me, the solution is passing the datatype to the UDF function. Duh

Ian Murphy
  • 21
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 10 '22 at 13:27