Does saveAsTable doubles memory?

Asked Apr 18 '19 at 16:13

Active Apr 18 '19 at 19:28

Viewed 142 times

I'm reading quite some data (2.3TB) into a spark dataframe. All CSV files prepared for a prediction model.

Once loaded we use a temporary view to store it

dSales = spark.read.option("delimiter",",").option("header", "true").option("inferSchema", "true").csv("/mnt/" + sourceMountName + "/")
dSales.createOrReplaceTempView("dSales")

After that we produce several other tables with joins and write them all to the database. These tables are used in PowerBI.

My question is, how can I get that big Sales dataframe and the Tempview out of memory once everything are processed?

edited Apr 18 '19 at 19:28

thebluephantom

16,458
8
40
83

asked Apr 18 '19 at 16:13

Harry Leboeuf

Possible duplicate of [Remove Temporary Tables from Apache SQL Spark](https://stackoverflow.com/questions/32376066/remove-temporary-tables-from-apache-sql-spark) – pault Apr 18 '19 at 17:00
You're looking for [`pyspark.sql.functions.dropTempView()`](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Catalog.dropTempView): `spark.catalog.dropTempView("dSales")` – pault Apr 18 '19 at 17:00

Does saveAsTable doubles memory?

0 Answers0