0

I've inherited a cluster that uses knox and am trying to figure out why the Spark history server is available for completed Spark jobs but the Spark UI is not available for in-progress Spark applications.

In this yarn UI (which is exposed via Knox) there are 5 completed yarn applications and 1 in-progress yarn application. All are spark applications: yarn UI exposed via knox

In the Tracking UI columns the available links are:

The five links pertaining to the completed jobs all successfully bring up the Spark History server UI for those jobs. If I issue cat ${GATEWAY_HOME}/logs/gateway-audit.log I can see the following appear when I hit any of those five links:

20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0001|unavailable|Request method: GET
20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0001|unavailable|Request method: GET
20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0001|success|Response status: 302
20/01/27 15:50:55 |||audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0001|success|Response status: 302

20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||access|uri|/gateway/my-cluster-name/sparkhistory/history/application_1580137635209_0001/1|unavailable|Request method: GET 20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||dispatch|uri|http://my-cluster-name-m:18080/history/application_1580137635209_0001/1/|unavailable|Request method: GET
20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||dispatch|uri|http://my-cluster-name-m:18080/history/application_1580137635209_0001/1/|success|Response status: 30

and lots and lots of other log records for Spark History UI resources. All good. Notice the 302 record (redirect)

However, if I hit the link for the in-progress application I get sent to http://my-cluster-name-m:18080/history/application_1580137635209_0006/1 which is the cluster master node, and the following displayed: enter image description here

In the logs I see:

20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0006|unavailable|Request method: GET
20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0006|unavailable|Request method: GET
20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0006|success|Response status: 200
20/01/27 15:58:38 |||audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0006|success|Response status: 200

Notice there are no 302 records there.

Edit: Since originally posting this I have noticed that if i click on the Tracking UI link immediately after the application starts then I am taken to the details of the yarn application:

yarn app details

A few seconds later clicking on the same link will take me to the error as shown above.

I'm a bit lost at this point. Can anyone help explain why I can't view the Spark UI for in-progress applications? Any pointers as to how I can diagnose would be welcomed.

jamiet
  • 10,501
  • 14
  • 80
  • 159
  • 1
    Something seems to be off in the logs, the url that is failing is for `application_1580137635209_0006` but the failure audit logs are for application `application_1580137635209_0007`. It would be better if you file a [Knox JIRA](https://issues.apache.org/jira/secure/CreateIssue!default.jspa), also include relevant logs from `gateway.log` file. Logs with DEBUG enabled would be much more useful. This looks more like a rewrite issue, do you see any errors in `gateway.log` file ? – Sandeep More Jan 28 '20 at 02:53
  • @SandeepMore You are correct, that's my bad, apologies. I grabbed the logs from the tail of gateway-audit.log which just happened to be for `application_1580137635209_0007`. I intended to edit them to make them consistent with the screenshot but I forgot to do so in the second batch of log records. I have now done so. – jamiet Jan 28 '20 at 08:05
  • Any idea how to enable DEBUG logging? I googled for it and found https://community.cloudera.com/t5/Support-Questions/How-to-enable-debug-logging-for-Knox/td-p/211479 but the answer there only refers to using Ambari, which we are not using. – jamiet Jan 28 '20 at 08:10
  • Ignore previous question, I think I've found it {GATEWAY_HOME}/conf/gateway-log4j.properties – jamiet Jan 28 '20 at 08:44
  • 1
    No worries, happens to me all the time. Note on logs, if you want more fine grain logs you can turn on Wire debugging in the log4j properties, it is noisy but can be useful in seeing http headers that are sent and received by Knox to the backend service. – Sandeep More Jan 28 '20 at 14:47
  • @SandeepMore Thx for your help. I found the problem and posted an answer. – jamiet Jan 30 '20 at 08:10
  • Great ! glad it worked :) – Sandeep More Jan 30 '20 at 14:34

1 Answers1

0

OK, the answer is rather embarrassing. The cause was simply that the spark UI was not enabled. Changing setting spark.ui.enabled to true solved this particular problem.

jamiet
  • 10,501
  • 14
  • 80
  • 159