0

I have installed spark with cloudera manager,i have Configured and Started the Spark Service by using this:

  /opt/cloudera/parcels/SPARK/lib/spark/sbin/start-master.sh
  /opt/cloudera/parcels/SPARK/lib/spark/sbin/start-slaves.sh

then i want to run WordConut to test my spark ,first i start spark-shell on my master nodes:

15/07/28 13:44:25 INFO spark.HttpServer: Starting HTTP Server
15/07/28 13:44:25 INFO server.Server: jetty-7.6.8.v20121106
15/07/28 13:44:25 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:45213
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 0.9.0
      /_/

Using Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
Type in expressions to have them evaluated.
Type :help for more information.
15/07/28 13:44:31 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/07/28 13:44:32 INFO Remoting: Starting remoting
15/07/28 13:44:32 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@hadoop241:45741]
15/07/28 13:44:32 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@hadoop241:45741]
15/07/28 13:44:32 INFO spark.SparkEnv: Registering BlockManagerMaster
15/07/28 13:44:32 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150728134432-ac8c
15/07/28 13:44:32 INFO storage.MemoryStore: MemoryStore started with capacity 294.9 MB.
15/07/28 13:44:32 INFO network.ConnectionManager: Bound socket to port 56158 with id = ConnectionManagerId(hadoop241,56158)
15/07/28 13:44:32 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/07/28 13:44:32 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop241:56158 with 294.9 MB RAM
15/07/28 13:44:32 INFO storage.BlockManagerMaster: Registered BlockManager
15/07/28 13:44:32 INFO spark.HttpServer: Starting HTTP Server
15/07/28 13:44:32 INFO server.Server: jetty-7.6.8.v20121106
15/07/28 13:44:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:39279
15/07/28 13:44:32 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.2.241:39279
15/07/28 13:44:32 INFO spark.SparkEnv: Registering MapOutputTracker
15/07/28 13:44:32 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-06dad7a7-d1fb-433d-bbab-37f20fb02057
15/07/28 13:44:32 INFO spark.HttpServer: Starting HTTP Server
15/07/28 13:44:32 INFO server.Server: jetty-7.6.8.v20121106
15/07/28 13:44:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46380
15/07/28 13:44:32 INFO server.Server: jetty-7.6.8.v20121106
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null}
15/07/28 13:44:32 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/07/28 13:44:32 INFO ui.SparkUI: Started Spark Web UI at http://hadoop241:4040
15/07/28 13:44:32 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.2.241:7077...
Created spark context..
Spark context available as sc.

scala> 15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150728134433-0001
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor added: app-20150728134433-0001/0 on worker-20150724192744-hadoop246-7078 (hadoop246:7078) with 16 cores
15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150728134433-0001/0 on hostPort hadoop246:7078 with 16 cores, 512.0 MB RAM
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor added: app-20150728134433-0001/1 on worker-20150724132945-hadoop241-7078 (hadoop241:7078) with 8 cores
15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150728134433-0001/1 on hostPort hadoop241:7078 with 8 cores, 512.0 MB RAM
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor added: app-20150728134433-0001/2 on worker-20150724132947-hadoop245-7078 (hadoop245:7078) with 8 cores
15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150728134433-0001/2 on hostPort hadoop245:7078 with 8 cores, 512.0 MB RAM
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor added: app-20150728134433-0001/3 on worker-20150724132949-hadoop254-7078 (hadoop254:7078) with 8 cores
15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150728134433-0001/3 on hostPort hadoop254:7078 with 8 cores, 512.0 MB RAM
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor added: app-20150728134433-0001/4 on worker-20150724183923-hadoop217-7078 (hadoop217:7078) with 8 cores
15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150728134433-0001/4 on hostPort hadoop217:7078 with 8 cores, 512.0 MB RAM
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor updated: app-20150728134433-0001/3 is now RUNNING
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor updated: app-20150728134433-0001/4 is now RUNNING
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor updated: app-20150728134433-0001/1 is now RUNNING
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor updated: app-20150728134433-0001/2 is now RUNNING
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor updated: app-20150728134433-0001/0 is now RUNNING
15/07/28 13:44:35 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop241:60944/user/Executor#1370617929] with ID 1
15/07/28 13:44:36 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop241:38177 with 294.9 MB RAM
15/07/28 13:44:37 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop217:45179/user/Executor#357014410] with ID 4
15/07/28 13:44:38 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop217:32361 with 294.9 MB RAM
15/07/28 13:44:38 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop254:4899/user/Executor#-432875177] with ID 3
15/07/28 13:44:38 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop245:54837/user/Executor#2060262779] with ID 2
15/07/28 13:44:38 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop246:41470/user/Executor#296060469] with ID 0
15/07/28 13:44:38 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop245:11915 with 294.9 MB RAM
15/07/28 13:44:39 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop246:55377 with 294.9 MB RAM
15/07/28 13:44:39 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop254:48560 with 294.9 MB RAM



val file=sc.textFile("hdfs//192.168.2.241:8020/root/workspace/testfile")

until this step,there is no problems,but i get some issues in the next steps:

val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)

i get this:

java.lang.NoClassDefFoundError: com/google/protobuf/ServiceException
    at org.apache.hadoop.ipc.ProtobufRpcEngine.<clinit>(ProtobufRpcEngine.java:64)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1713)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1678)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772)
    at org.apache.hadoop.ipc.RPC.getProtocolEngine(RPC.java:201)
    at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:522)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:347)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:168)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:448)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:410)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2308)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:163)
    at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:575)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:363)
    at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:336)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:391)
    at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:391)
    at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:111)
    at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:111)
    at scala.Option.map(Option.scala:145)
    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:111)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:133)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
    at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
    at org.apache.spark.rdd.FlatMappedRDD.getPartitions(FlatMappedRDD.scala:30)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
    at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
    at scala.Option.getOrElse(Option.scala:120)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
    at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:58)
    at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:354)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:14)
    at $iwC$$iwC$$iwC.<init>(<console>:19)
    at $iwC$$iwC.<init>(<console>:21)
    at $iwC.<init>(<console>:23)
    at <init>(<console>:25)
    at .<init>(<console>:29)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:788)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:833)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:745)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:593)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:600)
    at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:603)
    at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:926)
    at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
    at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:876)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:968)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.ServiceException
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 84 more

can anyone help me ? :)

3 Answers3

0

Looks like there is some miss much with versions of packages

Spark is very sensitive to the version of the cluster it is running on, and must be compiled with the same exact versions

for example, here are the instruction for cloudera 5.3 cluster: http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cdh_ig_spark_installation.html

lev
  • 3,986
  • 4
  • 33
  • 46
0

I have found this problem is caused by missing the jar file of protobuf-java-2.4.1.jar in this dictionary: /opt/cloudera/parcels/SPARK/lib/spark/lib

0

I got same problem: protobuf version earlier (2.5.0), following is the solution to this problem step by step process. Hope that can help you.

Exception in thread "dag-scheduler-event-loop" java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AppendRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;

Update protobuf to 2.5 from 2.4.x (in 2.1.0-beta)

https://issues.apache.org/jira/browse/HADOOP-9845

Reason:

Protobuf 2.5.0 bug?

Result:

Need delete "exclusion protobuf v2.5.0 in Spark-core"

Community
  • 1
  • 1
  • BTW, val file=sc.textFile("hdfs//192.168.2.241:8020/root/workspace/testfile") should be val file=sc.textFile("hdfs://192.168.2.241:8020/root/workspace/testfile") ... – Benedict Jin Feb 02 '16 at 08:57