I have a Spark dataframe in a SparklyR interface, and I'm trying to extract elements from an array column.
df <- copy_to(sc, data.frame(A=c(1,2),B=c(3,4))) ## BUILD DATAFRAME
dfnew <- df %>% mutate(C=Array(A,B)) %>% select(C) ## CREATE ARRAY COL
> dfnew ## VIEW DATAFRAME
# Source: spark<?> [?? x 1]
C
<list>
1 <dbl [2]>
2 <dbl [2]>
dfnew %>% sdf_schema() ## VERIFY COLUMN TYPE IS ARRAY
$C$name
[1] "C"
$C$type
[1] "ArrayType(DoubleType,true)"
I can extract an element with "mutate"...
dfnew %>% mutate(myfirst_element=C[[1]])
# Source: spark<?> [?? x 2]
C myfirst_element
<list> <dbl>
1 <dbl [2]> 3
2 <dbl [2]> 4
But I want to extract an element on the fly with "select". However, all attempts just return the full column:
> dfnew %>% select("C"[1])
# Source: spark<?> [?? x 1]
C
<list>
1 <dbl [2]>
2 <dbl [2]>
> dfnew %>% select("C"[[1]])
# Source: spark<?> [?? x 1]
C
<list>
1 <dbl [2]>
2 <dbl [2]>
> dfnew %>% select("C"[[1]][1])
# Source: spark<?> [?? x 1]
C
<list>
1 <dbl [2]>
2 <dbl [2]>
> dfnew %>% select("C"[[1]][[1]])
# Source: spark<?> [?? x 1]
C
<list>
1 <dbl [2]>
2 <dbl [2]>
I've also tried using "sdf_select", without success:
> dfnew %>% sdf_select("C"[[1]][1])
# Source: spark<?> [?? x 1]
C
<list>
1 <dbl [2]>
2 <dbl [2]>
In PySpark you can access the elements explicitly e.g. col("C")[1]; in scala you can use getItem or element_at; and in SparkR you can also use element_at. But does anyone know a solution in a SparklyR setting? Thanks in advance for any help.