I'm reading quite some data (2.3TB) into a spark dataframe. All CSV files prepared for a prediction model.
Once loaded we use a temporary view to store it
dSales = spark.read.option("delimiter",",").option("header", "true").option("inferSchema", "true").csv("/mnt/" + sourceMountName + "/")
dSales.createOrReplaceTempView("dSales")
After that we produce several other tables with joins and write them all to the database. These tables are used in PowerBI.
My question is, how can I get that big Sales dataframe and the Tempview out of memory once everything are processed?