Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

Source Creators of Facebook’s Big Data infrastructure and Apache Hive have leveraged their experience to deliver Qubole Data Service (QDS) – a cloud Big Data service offering the same advanced capabilities used by Big Data savvy organizations.

Minimize operational interaction and provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure.

Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.

An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing - realize cost savings of as much as 50%-60% in total, while accomplishing tasks faster.

87 questions
1
vote
1 answer

Import csv file into Qubole

I am using qubole to run presto queries. I need to upload a csv file into my query but cannot figure out how to do this. Does anyone have any experience with this? For more details, I am under the analyze section. This is what I have so far…
nak5120
  • 4,089
  • 4
  • 35
  • 94
1
vote
0 answers

IN and NOT IN HiveQL

I am new to HiveQL and is IN and NOT IN supported in it? Especially when using Qubole? Here is my query: SELECT DISTINCT vId FROM table1 WHERE d.columnOne = "123" AND NOT d.columnTwo AND timestamp between 1523550000000 AND 1523930000000 AND NOT…
noobeerp
  • 417
  • 2
  • 6
  • 11
1
vote
1 answer

UDF to generate JSON string behaving inconsistently

I'm trying to generate a JSON string to store a variable number of history records in a single STRING column. The code works on all of my small tests, but fails (no error, just no data) when run on the actual data. Here's what I have: class…
FrankGT
  • 117
  • 7
1
vote
1 answer

Run Tensorflow in Qubole

I am trying to train LSTM using Spark python Notebook in Qubole. When I try to fit model, I received below error. I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to…
GihanDB
  • 591
  • 2
  • 6
  • 23
1
vote
1 answer

How to select records from week days?

I have hive table which contain daily records. I want to select record from week days. So i use bellow hive query to do it. I'm using QUBOLE API to do this. SELECT hour(pickup_time), COUNT(passengerid) FROM home_pickup WHERE …
GihanDB
  • 591
  • 2
  • 6
  • 23
1
vote
1 answer

AWS S3 access issue when using qubole/streamx on AWS EMR

I am using qubole/streamx as a kafka sink connector to consume data in kafka and store them in AWS S3. I created a user in AIM and permission is AmazonS3FullAccess. Then set key ID and key in hdfs-site.xml which dir is assign in…
Chris Feng
  • 189
  • 5
  • 19
1
vote
1 answer

pyspark job on qubole fails with "Retrying exception reading mapper output"

I have a pyspark job running via qubole which fails with the following error. Qubole > Shell Command failed, exit code unknown Qubole > 2016-12-03 17:36:53,097 ERROR shellcli.py:231 - run - Retrying exception reading mapper output: (22, 'The…
1
vote
1 answer

How do I optimize my hive query for finding Sum of Count of Records from multiple tables

I’ve to generate a report that will give me the sum of the counts from tables A, B and C for events that have been stored using Hive and my S3 buckets have been partitioned by Organization_id For eg: Table A – Has a record for every day John (and…
Ajay
  • 11
  • 2
1
vote
1 answer

Unable to create table in Qubole similar to mysql

I want to create a external table in Qubole similar to a table created in Mysql. Query for create table in mysql is: CREATE TABLE `mytable` ( `id` varchar(50) NOT NULL, `v_count` int(11) DEFAULT NULL, `l_visited` timestamp NOT NULL DEFAULT…
Rahul Kumar
  • 161
  • 1
  • 6
1
vote
2 answers

Autoscaling EMR- is it required? Should I just use EC2? Should I just use Qubole?

In order to reduce the time for provisioning, we've decided to keep up a dedicated EMR cluster with 5 instances (we expect to need about 5). In case we need more, we think we'll need to implement some sort of autoscaling. I'm not familiar at all…
user1136342
  • 4,731
  • 10
  • 30
  • 40
0
votes
1 answer

Pyspark error- Invalid argument, not a string or column

I have a dataframe in Pyspark - df_all. It has some data and need to do the following count = ceil(df_all.count()/1000000) It gives the following error TypeError: Invalid argument, not a string or column: 0.914914 of type . For…
user2280352
  • 145
  • 11
0
votes
0 answers

How to view log file in qubole

I would like to retreive the Qubole usage report, but I didnt know where does the data stored, I dont want to download the log file everytime but my aim was to built a table out of it. table of log from each query/scheduler in qubole
Subhi
  • 1
  • 1
0
votes
0 answers

Qubole Data in hive table returning all the values as null after creating the schema fro Amazon S3

I created the Hive table using the explore going under My Amazon S3. After creating the schema out of it I am able to create the external tables and store it into the Qubole hive explorer under the default. As I move further to query the data in…
0
votes
0 answers

Extracting json field from float in Hive using dataset

Quick one guys. I am facing an issue while querying a float JSON Column, as it returns the following error: "Error while compiling statement: FAILED: SemanticException [Error 10014]: line 11:5 Wrong arguments ''$.percentage'': No matching method for…
Diego
  • 1
  • 1
0
votes
1 answer

Presto Pivoting Data

I am really new to Presto and having trouble pivoting data in it. The method I am using is the following: select distinct location_id, case when role_group = 'IT' then employee_number end as IT_emp_num, case when role_group = 'SC' then…
llorcs
  • 79
  • 1
  • 10