0

Has anyone an idea how to retrieve from the probability column the first value "0" (which indicates the probability of that prediction being correct)

After running dataframe.schema (or dataframe.printSchema()) I got the following result for the probability column:

StructField('probability', VectorUDT(), True)

Below I am attaching part of the image of the dataframe. enter image description here

I tried to expand the column probability with col("probability.*") but it gave me an error:

Can only star expand struct data types. Attribute: `ArrayBuffer(probability)`.

I also tried to expand by just calling "probability.vectorType", for example! but I got the following error:

[INVALID_EXTRACT_BASE_FIELD_TYPE] Cannot extract a value from "probability". Need a complex type [STRUCT, ARRAY, MAP] but got "STRUCT, values: ARRAY>".
Susy84
  • 104
  • 6
  • 1
    Does this answer your question? [How to access element of a VectorUDT column in a Spark DataFrame?](https://stackoverflow.com/questions/39555864/how-to-access-element-of-a-vectorudt-column-in-a-spark-dataframe) – Ronak Jain Mar 20 '23 at 06:23
  • @Ronak Jain, thanks for your guidance. The answer marked as the "best one" did not help me much, but the answer from @Nidhi / n1tk solved the problem very clean. . . `prob_df1=lr_pred.withColumn("probability",lr_pred["probability"].cast("String"))` . . `prob_df =prob_df1.withColumn('probabilityre',split(regexp_replace("probability", "^\[|\]", ""), ",")[1].cast(DoubleType()))` – Susy84 Mar 21 '23 at 10:21

0 Answers0