0

I have a code that user data from a postgres database and save it in a delta lake:

import pyspark
from delta import *
import time


start_time = time.time()

builder = (pyspark.sql.SparkSession.builder.appName("MyApp") 
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") 
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
    .config("spark.local.dir","/media/myuser/app/spark/tmp")
    .config("spark.jars", "/home/myuser/_code/jars/postgresql-42.5.4.jar")
    .config("spark.driver.memory","16g")
    .config("spark.executor.memory","16g")
    .config("spark.driver.maxResultSize","16g")        
    .config("spark.memory.offHeap.size","16g")       
    .config("spark.memory.offHeap.enabled", True)
    .config("spark.worker.cleanup.enabled", True)
    .config("spark.cleaner.referenceTracking.cleanCheckpoints", True)
)

spark = configure_spark_with_delta_pip(builder).getOrCreate()

spark.sparkContext.setCheckpointDir("/media/myuser/app/spark/tmp/checkpoint")

df = spark.read.format("jdbc").option("url", "jdbc:postgresql://127.0.0.1:5432/IBSng") \
        .option("driver", "org.postgresql.Driver").option("dbtable", "normal_users") \
        .option("user", "ibs").option("password", "123")\
        .option("driver", "org.postgresql.Driver")\
        .load()

print(df.show(2))

print()
print()

print()
print()



for c in df.columns:
    df = df.withColumnRenamed(c, c.replace( "(" , "__"))
for c in df.columns:
    df = df.withColumnRenamed(c, c.replace( ")" , ""))

print(df.columns)
print()
print(df.printSchema())

df.write.format("delta").save("/media/myuser/data/delta-table_user")

print('Saved')


print("--- %s seconds ---" % (time.time() - start_time))

When I run this whit python directly (python save_user.py), it work well and save data in delta lake.

But when I want use spark-shell the OutOfMemoryError is arises.

:: loading settings :: url = jar:file:/home/myuser/app/spark-3.3.2-bin-hadoop3/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /home/myuser/.ivy2/cache
The jars for the packages stored in: /home/myuser/.ivy2/jars
io.delta#delta-core_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-cd632397-b58b-4e94-b8c9-a1b4e526a75e;1.0
        confs: [default]
        found io.delta#delta-core_2.12;2.2.0 in central
        found io.delta#delta-storage;2.2.0 in central
        found org.antlr#antlr4-runtime;4.8 in central
:: resolution report :: resolve 244ms :: artifacts dl 10ms
        :: modules in use:
        io.delta#delta-core_2.12;2.2.0 from central in [default]
        io.delta#delta-storage;2.2.0 from central in [default]
        org.antlr#antlr4-runtime;4.8 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-cd632397-b58b-4e94-b8c9-a1b4e526a75e
        confs: [default]
        0 artifacts copied, 3 already retrieved (0kB/7ms)
23/02/25 03:48:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/02/25 03:48:14 INFO SparkContext: Running Spark version 3.3.2
23/02/25 03:48:14 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
23/02/25 03:48:14 INFO ResourceUtils: ==============================================================
23/02/25 03:48:14 INFO ResourceUtils: No custom resources configured for spark.driver.
23/02/25 03:48:14 INFO ResourceUtils: ==============================================================
23/02/25 03:48:14 INFO SparkContext: Submitted application: MyApp
23/02/25 03:48:14 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 16384, script: , vendor: , offHeap -> name: offHeap, amount: 16384, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/02/25 03:48:14 INFO ResourceProfile: Limiting resource is cpu
23/02/25 03:48:14 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/02/25 03:48:14 INFO SecurityManager: Changing view acls to: myuser
23/02/25 03:48:14 INFO SecurityManager: Changing modify acls to: myuser
23/02/25 03:48:14 INFO SecurityManager: Changing view acls groups to: 
23/02/25 03:48:14 INFO SecurityManager: Changing modify acls groups to: 
23/02/25 03:48:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(myuser); groups with view permissions: Set(); users  with modify permissions: Set(myuser); groups with modify permissions: Set()
23/02/25 03:48:15 INFO Utils: Successfully started service 'sparkDriver' on port 36017.
23/02/25 03:48:15 INFO SparkEnv: Registering MapOutputTracker
23/02/25 03:48:15 INFO SparkEnv: Registering BlockManagerMaster
23/02/25 03:48:15 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/02/25 03:48:15 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/02/25 03:48:15 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/02/25 03:48:15 INFO DiskBlockManager: Created local directory at /media/myuser/app/spark/tmp/blockmgr-c9caa242-7f3b-4428-a573-20285e45c9c0
23/02/25 03:48:15 INFO MemoryStore: MemoryStore started with capacity 16.4 GiB
23/02/25 03:48:15 INFO SparkEnv: Registering OutputCommitCoordinator
23/02/25 03:48:15 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/02/25 03:48:15 INFO SparkContext: Added JAR /home/myuser/_code/jars/delta-core_2.12-2.2.0.jar at spark://mysqrver:36017/jars/delta-core_2.12-2.2.0.jar with timestamp 1677314894642
23/02/25 03:48:15 INFO SparkContext: Added file file:///home/myuser/.ivy2/jars/io.delta_delta-core_2.12-2.2.0.jar at file:///home/myuser/.ivy2/jars/io.delta_delta-core_2.12-2.2.0.jar with timestamp 1677314894642
23/02/25 03:48:15 INFO Utils: Copying /home/myuser/.ivy2/jars/io.delta_delta-core_2.12-2.2.0.jar to /media/myuser/app/spark/tmp/spark-3f37cf76-ba4d-4429-907f-c7db9d4ff90a/userFiles-ba352fdc-e8d4-4683-a289-98ef2e50c0dd/io.delta_delta-core_2.12-2.2.0.jar
23/02/25 03:48:15 INFO SparkContext: Added file file:///home/myuser/.ivy2/jars/io.delta_delta-storage-2.2.0.jar at file:///home/myuser/.ivy2/jars/io.delta_delta-storage-2.2.0.jar with timestamp 1677314894642
23/02/25 03:48:15 INFO Utils: Copying /home/myuser/.ivy2/jars/io.delta_delta-storage-2.2.0.jar to /media/myuser/app/spark/tmp/spark-3f37cf76-ba4d-4429-907f-c7db9d4ff90a/userFiles-ba352fdc-e8d4-4683-a289-98ef2e50c0dd/io.delta_delta-storage-2.2.0.jar
23/02/25 03:48:15 INFO SparkContext: Added file file:///home/myuser/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar at file:///home/myuser/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar with timestamp 1677314894642
23/02/25 03:48:15 INFO Utils: Copying /home/myuser/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar to /media/myuser/app/spark/tmp/spark-3f37cf76-ba4d-4429-907f-c7db9d4ff90a/userFiles-ba352fdc-e8d4-4683-a289-98ef2e50c0dd/org.antlr_antlr4-runtime-4.8.jar
23/02/25 03:48:15 INFO Executor: Starting executor ID driver on host mysqrver
23/02/25 03:48:15 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
23/02/25 03:48:15 INFO Executor: Fetching file:///home/myuser/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar with timestamp 1677314894642
23/02/25 03:48:15 INFO Utils: /home/myuser/.ivy2/jars/org.antlr_antlr4-runtime-4.8.jar has been previously copied to /media/myuser/app/spark/tmp/spark-3f37cf76-ba4d-4429-907f-c7db9d4ff90a/userFiles-ba352fdc-e8d4-4683-a289-98ef2e50c0dd/org.antlr_antlr4-runtime-4.8.jar
23/02/25 03:48:15 INFO Executor: Fetching file:///home/myuser/.ivy2/jars/io.delta_delta-storage-2.2.0.jar with timestamp 1677314894642
23/02/25 03:48:15 INFO Utils: /home/myuser/.ivy2/jars/io.delta_delta-storage-2.2.0.jar has been previously copied to /media/myuser/app/spark/tmp/spark-3f37cf76-ba4d-4429-907f-c7db9d4ff90a/userFiles-ba352fdc-e8d4-4683-a289-98ef2e50c0dd/io.delta_delta-storage-2.2.0.jar
23/02/25 03:48:15 INFO Executor: Fetching file:///home/myuser/.ivy2/jars/io.delta_delta-core_2.12-2.2.0.jar with timestamp 1677314894642
23/02/25 03:48:15 INFO Utils: /home/myuser/.ivy2/jars/io.delta_delta-core_2.12-2.2.0.jar has been previously copied to /media/myuser/app/spark/tmp/spark-3f37cf76-ba4d-4429-907f-c7db9d4ff90a/userFiles-ba352fdc-e8d4-4683-a289-98ef2e50c0dd/io.delta_delta-core_2.12-2.2.0.jar
23/02/25 03:48:15 INFO Executor: Fetching spark://mysqrver:36017/jars/delta-core_2.12-2.2.0.jar with timestamp 1677314894642
23/02/25 03:48:15 INFO TransportClientFactory: Successfully created connection to mysqrver/127.0.0.1:36017 after 36 ms (0 ms spent in bootstraps)
23/02/25 03:48:15 INFO Utils: Fetching spark://mysqrver:36017/jars/delta-core_2.12-2.2.0.jar to /media/myuser/app/spark/tmp/spark-3f37cf76-ba4d-4429-907f-c7db9d4ff90a/userFiles-ba352fdc-e8d4-4683-a289-98ef2e50c0dd/fetchFileTemp18341125476809150351.tmp
23/02/25 03:48:15 INFO Executor: Adding file:/media/myuser/app/spark/tmp/spark-3f37cf76-ba4d-4429-907f-c7db9d4ff90a/userFiles-ba352fdc-e8d4-4683-a289-98ef2e50c0dd/delta-core_2.12-2.2.0.jar to class loader
23/02/25 03:48:15 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45547.
23/02/25 03:48:15 INFO NettyBlockTransferService: Server created on mysqrver:45547
23/02/25 03:48:15 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/02/25 03:48:15 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, mysqrver, 45547, None)
23/02/25 03:48:15 INFO BlockManagerMasterEndpoint: Registering block manager mysqrver:45547 with 16.4 GiB RAM, BlockManagerId(driver, mysqrver, 45547, None)
23/02/25 03:48:15 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, mysqrver, 45547, None)
23/02/25 03:48:15 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, mysqrver, 45547, None)
23/02/25 03:48:16 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
23/02/25 03:48:16 INFO SharedState: Warehouse path is 'file:/home/myuser/_code/spark-warehouse'.
23/02/25 03:48:21 INFO CodeGenerator: Code generated in 181.980285 ms
23/02/25 03:48:21 INFO SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:0
23/02/25 03:48:21 INFO DAGScheduler: Got job 0 (showString at NativeMethodAccessorImpl.java:0) with 1 output partitions
23/02/25 03:48:21 INFO DAGScheduler: Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0)
23/02/25 03:48:21 INFO DAGScheduler: Parents of final stage: List()
23/02/25 03:48:21 INFO DAGScheduler: Missing parents: List()
23/02/25 03:48:21 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at showString at NativeMethodAccessorImpl.java:0), which has no missing parents
23/02/25 03:48:21 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 11.4 KiB, free 16.4 GiB)
23/02/25 03:48:21 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.8 KiB, free 16.4 GiB)
23/02/25 03:48:21 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on mysqrver:45547 (size: 5.8 KiB, free: 16.4 GiB)
23/02/25 03:48:21 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1513
23/02/25 03:48:21 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at showString at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0))
23/02/25 03:48:21 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0
23/02/25 03:48:22 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (mysqrver, executor driver, partition 0, PROCESS_LOCAL, 4299 bytes) taskResourceAssignments Map()
23/02/25 03:48:22 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Spark Context Cleaner"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "executor-heartbeater"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "RemoteBlock-temp-file-clean-thread"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "driver-heartbeater"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "executor-kill-mark-cleanup"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "heartbeat-receiver-event-loop-thread"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "SparkUI-47"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "SparkUI-46"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "SparkUI-45"
Exception in thread "SparkUI-48" 23/02/25 03:50:53 WARN SingleThreadEventExecutor: Unexpected exception from an event executor: 
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
23/02/25 03:50:53 WARN AbstractEventExecutor: A task raised an exception. Task: ScheduledFutureTask@300fea7(uncancellable, task: io.netty.handler.timeout.IdleStateHandler$AllIdleTimeoutTask@2f60d993, deadline: 0, period: 0)
java.lang.OutOfMemoryError: Java heap space
Exception in thread "netty-rpc-env-timeout" java.lang.OutOfMemoryError: Java heap space
23/02/25 03:50:53 INFO JDBCRDD: closed connection
23/02/25 03:50:53 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.OutOfMemoryError: Java heap space
23/02/25 03:50:53 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker for task 0.0 in stage 0.0 (TID 0),5,main]
java.lang.OutOfMemoryError: Java heap space
23/02/25 03:50:53 INFO SparkContext: Invoking stop() from shutdown hook
23/02/25 03:50:53 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (mysqrver executor driver): java.lang.OutOfMemoryError: Java heap space

23/02/25 03:50:53 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
23/02/25 03:50:53 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
23/02/25 03:50:53 INFO SparkUI: Stopped Spark web UI at http://mysqrver:4040
23/02/25 03:50:53 INFO TaskSchedulerImpl: Cancelling stage 0
23/02/25 03:50:53 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage cancelled
23/02/25 03:50:53 INFO DAGScheduler: ResultStage 0 (showString at NativeMethodAccessorImpl.java:0) failed in 152.227 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (mysqrver executor driver): java.lang.OutOfMemoryError: Java heap space

Driver stacktrace:
23/02/25 03:50:53 INFO DAGScheduler: Job 0 failed: showString at NativeMethodAccessorImpl.java:0, took 152.282492 s
23/02/25 03:50:54 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/02/25 03:50:54 INFO MemoryStore: MemoryStore cleared
23/02/25 03:50:54 INFO BlockManager: BlockManager stopped
23/02/25 03:50:54 INFO BlockManagerMaster: BlockManagerMaster stopped
23/02/25 03:50:54 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
Traceback (most recent call last):
  File "/home/myuser/_code/save_user2.py", line 33, in <module>
    
  File "/home/myuser/app/spark-3.3.2-bin-hadoop3/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 606, in show
  File "/home/myuser/app/spark-3.3.2-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
  File "/home/myuser/app/spark-3.3.2-bin-hadoop3/python/lib/pyspark.zip/pyspark/sql/utils.py", line 190, in deco
  File "/home/myuser/app/spark-3.3.2-bin-hadoop3/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o59.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (mysqrver executor driver): java.lang.OutOfMemoryError: Java heap space

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
        at scala.Option.foreach(Option.scala:407)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2238)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2259)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2278)
        at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:506)
        at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:459)
        at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:48)
        at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3868)
        at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2863)
        at org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:3858)
        at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
        at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3856)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3856)
        at org.apache.spark.sql.Dataset.head(Dataset.scala:2863)
        at org.apache.spark.sql.Dataset.take(Dataset.scala:3084)
        at org.apache.spark.sql.Dataset.getRows(Dataset.scala:288)
        at org.apache.spark.sql.Dataset.showString(Dataset.scala:327)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.OutOfMemoryError: Java heap space

23/02/25 03:50:54 INFO SparkContext: Successfully stopped SparkContext
23/02/25 03:50:54 INFO ShutdownHookManager: Shutdown hook called
23/02/25 03:50:54 INFO ShutdownHookManager: Deleting directory /media/myuser/app/spark/tmp/spark-3f37cf76-ba4d-4429-907f-c7db9d4ff90a
23/02/25 03:50:54 INFO ShutdownHookManager: Deleting directory /tmp/spark-d7ee40c5-c066-41be-a9bd-731dce646e10
23/02/25 03:50:54 INFO ShutdownHookManager: Deleting directory /media/myuser/app/spark/tmp/spark-3f37cf76-ba4d-4429-907f-c7db9d4ff90a/pyspark-888559cb-93da-4e25-8384-6cbee13c2103

For run the above code with spark-shell I use:

spark-submit    --packages io.delta:delta-core_2.12:2.2.0   --jars /home/myuser/_code/jars/delta-core_2.12-2.2.0.jar,/home/pyuser/_code/jars/postgresql-42.5.4.jar   save_user.py

Why this error accrued with spark-shell?

And why this warning not in python directly run?

23/02/25 03:48:16 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
23/02/25 03:48:16 INFO SharedState: Warehouse path is 'file:/home/myuser/_code/spark-warehouse'.
Lamanus
  • 12,898
  • 4
  • 21
  • 47
Tavakoli
  • 1,303
  • 3
  • 18
  • 36

0 Answers0