Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

Source Creators of Facebook’s Big Data infrastructure and Apache Hive have leveraged their experience to deliver Qubole Data Service (QDS) – a cloud Big Data service offering the same advanced capabilities used by Big Data savvy organizations.

Minimize operational interaction and provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure.

Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.

An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing - realize cost savings of as much as 50%-60% in total, while accomplishing tasks faster.

87 questions

votes

1 answer

How to safely insert parameters into a SQL query and get the resulting query?

I have to use a non DBAPI-compliant library to interact with a database (qds_sdk for Qubole). This library only allows to send raw SQL queries without parameters. Thus I would like a SQL injection-proof way to insert parameters into a query and get…

asked Jan 04 '21 at 16:51

Roméo Després

1,777
2
15
30

votes

2 answers

How to get Python in Qubole to save CSV and TXT files to Azure data lake?

I have Qubole connected to Azure data lake, and I can start a spark cluster, and run PySpark on it. However, I can't save any native Python output, like text files or CSVs. I can't save anything other than Spark SQL DataFrames. What should I do to…

python azure qubole

asked Aug 03 '20 at 19:21

HT.

votes

1 answer

How to change the timeout value when running commands on QDS

I've a spark-submit command that calls my python script. The code runs more than 36 hours, however because of the QDS timeout limit of 36 hours my command gets killed after 36 hours. Can someone help me how to change this parameter value to set to…

python qubole

asked Jun 17 '20 at 04:53

Trupti

votes

1 answer

Logging and Debuging on Qubole

How does one log on Qubole/access logs from spark on Qubole? The setup I have: java library (JAR) Zeppelin Notebook (Scala), simply calling a method from the library Spark, Yarn cluster Log4j2 used in the library (configured to log on stdout) How…

apache-spark qubole

asked May 26 '20 at 06:50

bde.dev

votes

1 answer

Spark Structured Streaming using spark-acid writeStream (with checkpoint) throwing org.apache.hadoop.fs.FileAlreadyExistsException

In our Spark app, we use Spark structured streaming. It uses Kafka as input stream, & HiveAcid as writeStream to Hive table. For HiveAcid, it is open source library called spark acid from qubole: https://github.com/qubole/spark-acid Below is our…

apache-spark spark-structured-streaming qubole spark-hive spark-checkpoint

asked May 22 '20 at 06:56

Shuwn Yuan Tee

5,578
6
28
42

votes

1 answer

Avoid pre-signed URL expiry when IAM role key rotates

In Airflow I have 2 tasks defined that run every day: the first one creates a zip file and saves it in AWS under s3://{bucket-name}/foo/bar/{date}/archive.zip the second one pre-signs that url (should expire in 7 days) and sends it to…

amazon-web-services airflow qubole

asked May 12 '20 at 13:16

Maria Livia

votes

3 answers

How to query table partitions list using

I need to programmatically query Qubole for the list of partitions for a Hive table. I can do this by calling the correct API endpoint as described here, but I would like to use the qds-sdj-java client to do this (I am already using it for other…

hive qubole

asked Apr 22 '20 at 22:44

GreenGiant

4,930
1
46
76

votes

1 answer

Qubole Presto datatype "Map" using the Like Operator

So I am trying to apply a simple like function for a Qubole query on Presto. For a string datatype I can simply do like '%United States of America%'. However for the column I am trying to apply this has the underlying datatype as "map" and thus…

sql hive sql-like presto qubole

asked Apr 03 '20 at 00:03

pp2000

votes

1 answer

How to upgrade Python version on Qubole?

The current version on Qubole is 3.5.3, and some packages, like PyMC3 and future XGBoost need higher versions. How do I upgrade? And would that affect other clusters' settings? error message

qubole

asked Mar 12 '20 at 01:23

HT.

votes

1 answer

Unable to write or read from S3 bucket with Default AWS KMS encryption enabled

I am unable to read or write into a Default AWS KMS encrypted bucket without using the following configuration on my Qubole cluster fs.s3a.server-side-encryption-algorithm=SSE-KMS fs.s3a.server-side-encryption.key= But if I enable this…

amazon-web-services amazon-s3 encryption amazon-kms qubole

asked Feb 20 '20 at 11:54

Nunna Krishna Teja

votes

1 answer

Qubole Kinesis Connector for Spark structured streaming throws an error

We are using Qubole Kinesis Connector (jar) for Spark structured streaming. This used to work fine but suddenly, it is throwing an error "S3 filesystem not found". We could use the KCL but we need to test it for foreachbatch. Are there any other…

apache-spark qubole

asked Feb 14 '20 at 01:38

Lightning-Analytics

votes

2 answers

Rest api in testdrive account?

Hi I am using Qubole trial version and it is test drive account so I am not getting API Token from control panel my accounts tab in qubole is there a way to access REST API's Now? Thanks in Advance

qubole

asked Feb 04 '20 at 06:30

sai Kumar

votes

2 answers

Running Scala jobs in Scheduler

My job runs fine in my notebook, but when I copy and paste the script into the Spark Scala scheduled job, I run into errors like "script.scala:15: error: not found: value sqlContext". What do I need to do to run my Scala code as scheduled job?

scala qubole

asked Jan 07 '20 at 21:43

Paul Mineau

votes

1 answer

PySpark Machine Learning on Wide Data in Qubole

I have a large dataset, with roughly 250 features, that I would like to use in a gradient-boosted trees classifier. I have millions of observations, but I'm having trouble getting the model to work with even 1% of my data (~300k observations). Below…

python machine-learning pyspark bigdata qubole

asked Jan 02 '20 at 18:33

ErrorJordan

votes

1 answer

Setting up AWS Glue to crawl Qubole

Currently I work with Qubole to access Hive data. I've added metadata from several databases, and want to add all the Hive metadata to AWS Glue. Is this possible? Any help is appreciated.

amazon-web-services hive aws-glue qubole

asked Dec 23 '19 at 19:07

Ash_s94

Prev 1 2 3

5 6 Next