6

I'm trying to use zeppelin-0.8.0 to connect to AWS Glue Development endpoint and when executing a cell below error occurs. And there is no helpful message to understand what could be the problem. Any leads appreciated

172318_1906434757 is finished, status: ERROR, exception: java.lang.RuntimeException: org.apache.thrift.TApplicationException: Internal error processing createInterpreter, result: %text org.apache.thrift.TApplicationException: Internal error processing createInterpreter
        at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
        at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_createInterpreter(RemoteInterpreterService.java:209)
        at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.createInterpreter(RemoteInterpreterService.java:192)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:169)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$2.call(RemoteInterpreter.java:165)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:165)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:132)
        at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:299)
        at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:407)
        at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
        at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:307)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

UPDATE: So as in the answer below looks like 0.8.0 doesn't work with Glue yet.. I had problems running 0.7.x aw well with the javax.ws.rx package having a bunch of MethodNotFoundException when running with Java 8(did not help update-alternative to Java 7 as well). But when running inside a JDK 7 docker container it worked with no problems and was able to connect to my Dev end point. Highly appreciate if anyone can clarify the root cause of it

Somasundaram Sekar
  • 5,244
  • 6
  • 43
  • 85

1 Answers1

2

Could you please provide more information, such as zeppin instance location. Is it running on your desktop/laptop or is it running as AWS Notebook server? Also did you try connecting to zeppelin 0.7.3 version, as mentioned here in this AWS forum link :

https://forums.aws.amazon.com/thread.jspa?threadID=285128

As per the above link dated Jul 2018, think AWS Glue doesn't yet support Zeppelin 0.8 version. I am assuming all other configurations, environment settings are done as needed. Can help more, if you can provide additional info.

UPDATE: Anyway, please refer here and setting up zeppelin on windows, for any help on setting up local development environment & zeppelin notebook.

Once you set up the zeppelin notebook, have an SSH connection established (using AWS Glue DevEndpoint URL), so you can have access to the data catalog/crawlers,etc., and also the S3 bucket where your data resides. Then, you can create your python scripts in the zeppelin notebook, and run from the zeppelin.

You can use dev instance provided by Glue, but you may incur additional costs for the same(EC2 instance charges).

Environment settings (updated in response to comments):

JAVA_HOME=E:\Java7\jre7
Path=E:\Python27;E:\Python27\Lib;E:\Python27\Scripts;
PYTHONPATH=E:\spark-2.1.0-bin-hadoop2.7\python;E:\spark-2.1.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip;E:\spark-2.1.0-bin-hadoop2.7\python\lib\pys
park.zip
SPARK_HOME=E:\spark-2.1.0-bin-hadoop2.7

Change the drive name/ folders accordingly. Let me know if any help neeed.

Yuva
  • 2,831
  • 7
  • 36
  • 60
  • I'm running zeppelin from my local instance. mentioned error occurs when trying to use 0.8.0. I'm unable to run 0.7.x versions at all, as the javax.ws.rs module conflicts between javax.ws.rs-api-2.0-m10.jar and jersey-core-1.13.jar and end up with bunch of MethodNotFound exceptions – Somasundaram Sekar Nov 07 '18 at 14:02
  • Not sure, what conflict errors you are seeing. I am able to set up zeppelin 0.7.3, and connect to aws glue using a ssh, and run my python code from my desktop zeppelin. I have updated my answer above with some links for setting up zepplin on a laptop, may be you can verify to see if any of your setup is causing the conflicts. – Yuva Nov 07 '18 at 15:53
  • Your answer kind of helped, but still not sure of the root cause, I doubted that difference in the version of java could be a issue, I was running with Java 8(but didn't help changing to Java 7 using update-alternative too). Now when I ran with a new docker java7 instance it actually works. Thanks anyway – Somasundaram Sekar Nov 07 '18 at 20:17