1

I'm recently using AWS Glue job to test to run some spark python codes, I kicked off a run yesterday and it succeeded, this morning, without any changes, I kicked off three times and it all failed. The logs are weird and I don't understand...:

This is copied from the error log:

kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec]
awk: /tmp/parse_yarn_logs.awk:6: warning: escape sequence `\[' treated as plain `['
awk: /tmp/parse_yarn_logs.awk:6: warning: escape sequence `\]' treated as plain `]'
awk: /tmp/parse_yarn_logs.awk:8: warning: escape sequence `\(' treated as plain `('
awk: /tmp/parse_yarn_logs.awk:8: warning: escape sequence `\)' treated as plain `)'
21/03/04 09:56:42 INFO client.RMProxy: Connecting to ResourceManager at ip-xxxxxx.ec2.internal/xxx.xx.xx.x:xxxx
awk: /tmp/parse_yarn_logs.awk:19: (FILENAME=- FNR=1) fatal: Unmatched ( or \(: /.*Unregistering ApplicationMaster with FAILED (diag message: Shutdown hook called before final status was reported.*$/

By looking at the full version logs, I found this bit seem causing the issue:

21/03/04 10:12:08 ERROR Client: Application diagnostics message: User application exited with status 1
Exception in thread "main" org.apache.spark.SparkException: Application application_xxxxxxxx_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/03/04 10:12:08 INFO ShutdownHookManager: Shutdown hook called
21/03/04 10:12:08 INFO ShutdownHookManager: Deleting directory /tmp/spark-xxxxxxxxxx
21/03/04 10:12:08 INFO ShutdownHookManager: Deleting directory /tmp/spark-xxxxxxxxxx

one of the runs startup time used 10 mins?! Normally it only used a few secs... Seems like glue is not very stable...and the job is failed or not based on my luck...

Does anyone know what's causing the issue and is there anything I can do to improve its perfomance? Thanks.

wawawa
  • 2,835
  • 6
  • 44
  • 105
  • Update: Mysteriously the execution succeeded this morning without any changes....Hope someone can explain this... – wawawa Mar 05 '21 at 09:44

1 Answers1

0

The same happens me now on AWS Glue Job. But on my side it happens when I add one new line to the code

device = DeviceDetector('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.77.34.5 Safari/537.36 QJY/2.0 Philco_PTV24G50SN-VB_DRM HDR DID/C0132bb2240f').parse() 

When I close this line the job is ok. Since it is new Python package in our code (I just added it) I have no idea as it was before. Hope somebody can explain it.

feechka
  • 205
  • 6
  • 16