Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

Source Creators of Facebook’s Big Data infrastructure and Apache Hive have leveraged their experience to deliver Qubole Data Service (QDS) – a cloud Big Data service offering the same advanced capabilities used by Big Data savvy organizations.

Minimize operational interaction and provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure.

Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.

An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing - realize cost savings of as much as 50%-60% in total, while accomplishing tasks faster.

87 questions
1
vote
0 answers

Insert overwrite doesn't delete all the old data files

We are trying to insert overwrite a hive table. Most of the times it's overwriting as expected, i.e deleting any old files and replace new files. We are seeing some inconsistencies with this behavior, once in a while all the old files are not…
Jas
  • 11
  • 2
1
vote
1 answer

Retrieve value in an array of an array with struct

I have a column in Hive table with type: array>> Here is the sample of data in the column: [ [ { "type": "PROFIT", "value": "100", "currency": "USD" }, { …
user1761325
  • 93
  • 2
  • 9
1
vote
0 answers

Query Qubole data in Python

I'm trying to query Qubole data in Python, but running into some issues. Below is my code: from qds_sdk.qubole import Qubole Qubole.configure(api_token="api_token", api_url="https://us.qubole.com/api") from qds_sdk.commands import…
BirdPlay6
  • 43
  • 5
1
vote
1 answer

Exclude records with certain values in Qubole

Using Qubole I have Table A (columns in json parsed...) ID Recommendation Decision 1 GOOD GOOD 2 BAD BAD 2 GOOD BAD 3 GOOD BAD 4 BAD GOOD 4 GOOD BAD I…
Kurlito
  • 13
  • 3
1
vote
2 answers

How to connect UiPath to Qubole Hive cluster and run a query

One of the teams using RPA in my company wants to automate reporting that is run in Qubole - Hive environment. The initial approach is to unleash the robot to log in to Okta, then Workbench in Qubole, run the query, and download results. Is there a…
1
vote
2 answers

Result-set inconsistency between hive and hive-llap

we are using Hive 3.1.x clusters on HDI 4.0, with 1 being LLAP and another Just HIVE. we've created a managed tables on both the clusters with the row count being 272409. Before merge on both…
Vinay K L
  • 45
  • 1
  • 10
1
vote
1 answer

Spark Submit Default Command line options

How can we change the parameters in Spark Submit Default Command line options in Qubole. Though there is a option to override the values if needed under "Spark Submit Command Line Options" but this option is not available in Spark "Command Line".
Throw
  • 11
  • 2
1
vote
1 answer

How to create hive external table with avro file on qubole?

Can someone point in the doc to create external table on qubole base on avro files? CREATE TABLE my_table_name ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT …
user10714010
  • 865
  • 2
  • 13
  • 20
1
vote
2 answers

Implement case class inside a class

I am using the below code to run in Qubole Notebook and the code is running successfully. case class cls_Sch(Id:String, Name:String) class myClass { implicit val sparkSession =…
1
vote
1 answer

Extracting json field from string in Hive using dataset

I am trying a very basic hive query. I am trying to extract a json field from a dataset but I always get \N for the json field, however some_string comes okay Here is my query : WITH dataset AS ( SELECT CAST( '{ "traceId": "abc",…
Bhavya Arora
  • 768
  • 3
  • 16
  • 35
1
vote
1 answer

Get Qubole data row wise using java

Am trying to run a hive query using Qubole SDK. Though am able to get the desired result as string, in order to better process it, am looking to access this row-wise. Something like a list of java objects. The way am getting the data…
roger_that
  • 9,493
  • 18
  • 66
  • 102
1
vote
1 answer

Recommendation on Performance optimization for SQL code

I have a code in Qubole that's taking almost 3 hours to execute. I am looking for some recommendations to decrease the code execution time. WITH -- Get latest date - 10 days before as day d AS ( SELECT CAST(CONCAT ( …
Flash
  • 11
  • 1
1
vote
1 answer

Syncing Qubole HIve table to Snowflake with Struct field

I have a table like following Qubole: use dm; CREATE EXTERNAL TABLE IF NOT EXISTS fact ( id string, fact_attr struct< attr1 : String, attr2 : String > ) STORED AS PARQUET LOCATION 's3://my-bucket/DM/fact' I have created…
Ambrish
  • 3,627
  • 2
  • 27
  • 42
1
vote
2 answers

Different results when distinct count by different time periods

I am trying to get a count of unique visitors. I first checked it by total without separating it by anytime frame. Main table (big data table sample): +-----------+----+-------+ |theDateTime|vD | vis | +----------------+-------+ |2018-10-03 |123…
noobeerp
  • 417
  • 2
  • 6
  • 11
1
vote
1 answer

Big files causing shuffle error in hadoop map reduce

I am seeing the following error when I try to process big file like size > 35GB files, but doesn't happen when I try less big file like size < 10GB . App > Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in…
Jal
  • 2,174
  • 1
  • 18
  • 37