Questions tagged [shark-sql]

Shark has been subsumed by Spark SQL. It was an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users.

Shark has been subsumed by . It was an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users.

59 questions
2
votes
2 answers

Are there any python or scala tools to connect the spark/shark

I want to use python or scala to connect shark server. But I didn't find any tools to do this. Are there any libs(python or scala/java). Thanks advanced.
Joey.Chang
  • 164
  • 2
  • 14
1
vote
1 answer

Return Boolean (1 or 0) if table contains duplicate rows

I wish to return a boolean value if there are duplicates in the table in Hive 0.9 For now, I'm doing this : select cast(case when count(*) > 0 then 1 else 0 end as smallint) Validate_Value from ( select guid, count(guid) cnt from…
underwood
  • 845
  • 2
  • 11
  • 22
1
vote
1 answer

SPARK - How to use function in group by query

I am going to migrate SHARK query into SPARK . Below is my sample SHARK query which use function in group by clause. select month(dt_cr) as Month, day(dt_cr) as date_of_created, count(distinct phone_number) as total_customers …
sandip
  • 394
  • 1
  • 4
  • 11
1
vote
1 answer

How to create a Shark query from a saved text file out of a RDD?

I have a JavaPairRDD results and I save it by calling: results.saveAsTextFile("data") Then I get files content like: (www.abc.com,0.15712321 www.def.com,www.aaa.com,www.ccc.com) Now, I want to create a table with three fields using…
MatrixZ
  • 46
  • 5
1
vote
1 answer

How can I get Spark/Shark to start on DSE 4.5.1

This was initially working out of the box and then AWS kindly shut down this server for me. So I rebuilt it and made it the new job tracker (it was also the old job tracker). Now I can't figure out how to get Spark/Shark to run. I get the same…
Eric Lubow
  • 763
  • 2
  • 12
  • 30
1
vote
1 answer

Can someone explain this : "Spark SQL supports a different use case than Hive."

I am referring to the following link : Hive Support for Spark It says : "Spark SQL supports a different use case than Hive." I am not sure why that will be the case. Does this mean as a Hive user i cannot use Spark execution engine through Spark…
Venkat
  • 1,810
  • 1
  • 11
  • 14
1
vote
1 answer

Shark external table performance

How does querying from an external table in Shark located on the local filesystem compare to using data located on HDFS in terms of query performance? I plan to use a single high end server for running shark queries and was wondering if its…
DaTaBomB
  • 623
  • 3
  • 11
  • 23
1
vote
1 answer

JDBC connection to Shark Server hangs

I am using following configuration for my shark cluster Scala 2.10.3 Spark 0.9.0 Hive 0.12.0-chd5.0.2 Shark 0.9.0 Spark and Hive are configured via Cloudera manager (CDH 5.0.2) I am following this tutorial to connect to shark…
Junaid
  • 768
  • 8
  • 13
1
vote
1 answer

which Hadoop component can handle all the oracle queries.?

Which hadoop component can handle all the oracle functions & which has low latency.. Am thinking to use the components like Presto, Drill and Shark.. Can anyone tell which of the above technology can handle all the functions in oracle with low…
Pavan Chakravarthy
  • 573
  • 4
  • 7
  • 16
1
vote
0 answers

java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.HivePassThroughOutputFormat

I have followed this link for installing shark on CDH5. I have installed it but as it also mentioned on the above block:- This -skipRddReload is only needed when you have some table with hive/hbase mapping, because of some issus in…
Aashu
  • 1,247
  • 1
  • 26
  • 41
1
vote
0 answers

How to convert Spark's TableRDD to RDD[Array[Double]] in Scala?

I am trying to perform Scala operation on Shark. I am creating an RDD as follows: val tmp: shark.api.TableRDD = sc.sql2rdd("select duration from test") I need it to convert it to RDD[Array[Double]]. I tried toArray, but it doesn't seem to work. I…
visakh
  • 2,503
  • 8
  • 29
  • 55
1
vote
2 answers

installing apache shark in stand alone mode result in scala error

I'm basicallly following the guide on https://github.com/amplab/shark/wiki/Running-Shark-Locally. I downloaded scala I'm using ec2 amazon linux my shark/shark-0.8.0/conf/shark-env.sh configuration file look like this export SPARK_MEM=1g export…
user2773013
  • 3,102
  • 8
  • 38
  • 58
1
vote
1 answer

installing HDFS for use with SHARK without YARN

I'm trying to install Apache Shark. One of the requirement is to have HDFS installed. I don't want to use YARN or MESOS. I just want HDFS. My question is: Does this mean I can only install hadoop distribution prior to 2.x? If so, which one? or can…
user2773013
  • 3,102
  • 8
  • 38
  • 58
1
vote
0 answers

Error in Configuring Spark/Shark on DSE

, I have installed 1) scala-2.10.3 2) spark-1.0.0 Changed spark-env.sh with below variables export SCALA_HOME=$HOME/scala-2.10.3 export SPARK_WORKER_MEMORY=16g I can see Spark master. 3) shark-0.9.1-bin-hadoop1 Changed shark-env.sh with below…
1
vote
0 answers

Issue with loading data into Parquet table from a JSON Serde based Hive table

I have a HIVE table defined using a JSON Serde. I'm using the Shark distribution (http://shark.cs.berkeley.edu/). The definition is as follows: CREATE TABLE lastfm( artist string, title string , track_id string, similars array>, tags…
visakh
  • 2,503
  • 8
  • 29
  • 55