0

I'm reading quite some data (2.3TB) into a spark dataframe. All CSV files prepared for a prediction model.

Once loaded we use a temporary view to store it

dSales = spark.read.option("delimiter",",").option("header", "true").option("inferSchema", "true").csv("/mnt/" + sourceMountName + "/")
dSales.createOrReplaceTempView("dSales")

After that we produce several other tables with joins and write them all to the database. These tables are used in PowerBI.

My question is, how can I get that big Sales dataframe and the Tempview out of memory once everything are processed?

thebluephantom
  • 16,458
  • 8
  • 40
  • 83
Harry Leboeuf
  • 723
  • 12
  • 30
  • Possible duplicate of [Remove Temporary Tables from Apache SQL Spark](https://stackoverflow.com/questions/32376066/remove-temporary-tables-from-apache-sql-spark) – pault Apr 18 '19 at 17:00
  • You're looking for [`pyspark.sql.functions.dropTempView()`](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Catalog.dropTempView): `spark.catalog.dropTempView("dSales")` – pault Apr 18 '19 at 17:00

0 Answers0