-1
from pyspark.sql import *
from IPython.core.display import display, HTML
from pyspark.sql.functions import first
from functools import reduce



display(HTML("<style>.container { width:100% !important; }</style>"))

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.some.config.option") \
    .getOrCreate()


for i in range(2,10):
    globals()['folders{}'.format(i)] = ["./result/20200"+str(i)+"/data1/*.csv"]
    print(globals()['folders{}'.format(i)])
    globals()['df{}'.format(i)]=spark.read.option("header", "false").csv(globals()['folders{}'.format(i)])
    globals()['df{}'.format(i)].createOrReplaceTempView("iris")
    globals()['concat{}'.format(i)]=globals()['df{}'.format(i)].groupBy().pivot("_c0").agg(first('_c7'))
    globals()['concat{}'.format(i)].show()
    

uni2_9=unionAll(concat2, concat3, concat4,concat5,concat6,concat7,concat8,concat9)
uni2_9.show()

I want to combine dataframes sequentially into one table.

I used this a while ago. but now i got an error in this line ---> 30 uni2_9=unionAll([concat2, concat3, concat4,concat5,concat6,concat7,concat8,concat9]) NameError: name 'unionAll' is not defined

How to use the correct spark unionall?

powpow
  • 35
  • 1
  • 5
  • I solved it this way but now it doesn't work this way – powpow Aug 23 '21 at 09:28
  • the current code in the question does not have the `unionAll` function defined and hence you are getting the `NameError: name 'unionAll' is not defined`. Can you try adding the function. – heretolearn Aug 23 '21 at 09:32

1 Answers1

0

The error is pretty obvious :

NameError: name 'unionAll' is not defined

It means that you are trying to use a function that you did not define or import.

Just look at the doc to know how to use it : https://spark.apache.org/docs/2.4.7/api/python/pyspark.sql.html#pyspark.sql.DataFrame.unionAll

Steven
  • 14,048
  • 6
  • 38
  • 73