By default visual query spark job runs in local mode. What is suggested setting for Visual Query when you are running Kylo into production with bigger chunk of data ?
Thanks Shashi
For this you can edit here:-
/opt/kylo/kylo-services/bin/run-kylo-spark-shell.sh
After "spark-submit".
I tried running visual query spark job in yarn-cluster mode , it seems to work fine. Following are my observation on each mode .
Local Mode : It works pretty well but it puts pressure on edge node as looks for memory and cores only from edge node.
yarn-client : To avoid edge node limitation , we configure visual query to run yarn client mode. But one day we had a disk failure and visual query failed as it was throwing file not found exception . Surprisingly it did not look for another copy of file on different node (HDFS replication concept).
yarn-cluster : This mode gave us a better performance and we were able to avoid file not found exception. But only challenge we faced that was because of design approach for visual query. When you launch spark shell server for visual query , it creates a spark context and that context never dies until you kill you spark server application. As an impact of this , on yarn resource manager UI , you will always find thinkbig spark server running and which never releases a resources until you kill your application.
Probably we should try to enhance visual query as its one of the coolest feature in Kylo.