How to turn array to int in pyspark?

Question

Let's say I have the following dataframe:

my_x = [([1,100]), ([2]), ([3,2])] my_df = spark.createDataFrame(my_x, ArrayType(IntegerType()))

Now, I want to extract the first element (int) from each array-row. So the final dataframe would have 1,2,3 (one per row). Is there a way of doing this without using a UDF? I tried doing something like

my_df.withColumn("casted", my_df.value.getItem(IntegerType()))

to no avail.

Thanks!

Possible duplicate of [How to extract an element from a array in pyspark](https://stackoverflow.com/questions/45254928/how-to-extract-an-element-from-a-array-in-pyspark) — pault, Aug 21 '19 at 17:04

score 0 · Answer 1 · answered Aug 21 '19 at 15:56

Select the 0th position :

my_df.show()
+--------+
|   value|
+--------+
|[1, 100]|
|     [2]|
|  [3, 2]|
+--------+

my_df.withColumn('casted', my_df['value'][0]).show()
+--------+------+
|   value|casted|
+--------+------+
|[1, 100]|     1|
|     [2]|     2|
|  [3, 2]|     3|
+--------+------+

score 0 · Answer 2 · answered Aug 21 '19 at 18:30

A different approach from the above:

    from pyspark.sql.types import ArrayType, IntegerType
    my_x = [([1,100]), ([2]), ([3,2])]
    my_df = spark.createDataFrame(my_x, ArrayType(IntegerType()))

    my_df = my_df.withColumn("firstVal", col("value").getItem([0]))

This should return a dataframe consisting of two columns:

    +--------+--------+
    |   value|FirstVal|
    +--------+--------+
    |[1, 100]|       1|
    |     [2]|       2|
    |  [3, 2]|       3|
    +--------+--------+

score 0 · Answer 3 · answered Aug 21 '19 at 19:35

0

You can also use element_at function:

from pyspark.sql.types import ArrayType, IntegerType
from pyspark.sql import functions as F
x = [([1,100]), ([2]), ([3,2])]
df = spark.createDataFrame(x, ArrayType(IntegerType()))
df = df.withColumn('extract', F.element_at(F.col('value'), 1))
df.show()

+--------+-------+
|   value|extract|
+--------+-------+
|[1, 100]|      1|
|     [2]|      2|
|  [3, 2]|      3|
+--------+-------+

answered Aug 21 '19 at 19:35

niuer

1,589
2
11
14

I am working on a similar problem. How can I extract all values from the list. – joel Jan 30 '20 at 08:01

How to turn array to int in pyspark?

3 Answers3