I'm trying to learn to use functional programming constructs like reduce
, and I'm trying to grok how to use it to union
multiple dataframes
together. I was able to accomplish it with a simple for loop. You can see the commented out expr
which was my attempt, the problem I'm running into is the fact that reduce
is a Python
function, and so I'm interleaving Python
and Spark
code in the same function, which doesn't make the compiler happy.
Here is my code:
df1 = sqlContext.createDataFrame(
[
('1', '2', '3'),
],
['a', 'b', 'c']
)
df2 = sqlContext.createDataFrame(
[
('4', '5', '6'),
],
['a', 'b', 'c']
)
df3 = sqlContext.createDataFrame(
[
('7', '8', '9'),
],
['a', 'b', 'c']
)
l = [df2, df3]
# expr = reduce(lambda acc, b: acc.unionAll(b), l, '')
for df in l:
df1 = df1.unionAll(df)
df1.select('*').show()