Sparklyr : separate rows on 2 columns

Question

I am using sparklyr for a project. I have a Spark Dataframe with lists in some of the columns and I'd like to separate them into multiple rows, i.e. have one value in each row, exactly like separate_rows does in dplyr.

So basically my dataframe is like this

 | x     |   y
1| [a,b] | [c,d]

And I'd like to have something like this in the end :

 | x     | y
1| a     | c 
2| b     | d

Like suggested in this post, explode is a good start, but it can do the job for only one column at once ; and if I use it twice, I will end up with 4 rows here instead of the 2 I want. In this very simple example, I could manage my way to keep only the rows that I want, but things can get a bit messier if there are more than two elements in the lists...

Something I thought about would be to do :

Merge the columns x and y into a single column which would contain [[a,c] , [b,d]]
Then use explode to have [a,c] and then [b,d]
Then explode but in columns (rather that in rows).

Only I don't know how to do 1) and 3).

Thank you for the help !

Here is a reproducible example obtained with collect and dput :

structure(list(ref_amount = list(list(967.66, 1592.56), list(
967.66, 1592.56)), ref_theta = list(list(5.26977034898459, 
5.16119062369122), list(5.26977034898459, 5.16119062369122))), .Names = c("ref_amount", 
"ref_theta"), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
"data.frame"))

Are there always two elements in each "array"? Are these arrays or structs (that impossible to say based on the collected result). — Alper t. Turker, Jul 26 '18 at 16:52
No, not always 2, which is what makes it complicated. The columns are actually the results of `summarise` + `collect_list` ; when I use `glimpse` on my Spark dataframe it says ``. — Vincent, Jul 26 '18 at 17:07
Why would you collect_list and explode just after that? Could provide more context? There might be a better solution here. — Alper t. Turker, Jul 29 '18 at 12:39

Sparklyr : separate rows on 2 columns

0 Answers0