Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

Source Creators of Facebook’s Big Data infrastructure and Apache Hive have leveraged their experience to deliver Qubole Data Service (QDS) – a cloud Big Data service offering the same advanced capabilities used by Big Data savvy organizations.

Minimize operational interaction and provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure.

Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.

An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing - realize cost savings of as much as 50%-60% in total, while accomplishing tasks faster.

87 questions
0
votes
1 answer

How to safely insert parameters into a SQL query and get the resulting query?

I have to use a non DBAPI-compliant library to interact with a database (qds_sdk for Qubole). This library only allows to send raw SQL queries without parameters. Thus I would like a SQL injection-proof way to insert parameters into a query and get…
Roméo Després
  • 1,777
  • 2
  • 15
  • 30
0
votes
2 answers

How to get Python in Qubole to save CSV and TXT files to Azure data lake?

I have Qubole connected to Azure data lake, and I can start a spark cluster, and run PySpark on it. However, I can't save any native Python output, like text files or CSVs. I can't save anything other than Spark SQL DataFrames. What should I do to…
HT.
  • 161
  • 1
  • 7
0
votes
1 answer

How to change the timeout value when running commands on QDS

I've a spark-submit command that calls my python script. The code runs more than 36 hours, however because of the QDS timeout limit of 36 hours my command gets killed after 36 hours. Can someone help me how to change this parameter value to set to…
Trupti
  • 1
0
votes
1 answer

Logging and Debuging on Qubole

How does one log on Qubole/access logs from spark on Qubole? The setup I have: java library (JAR) Zeppelin Notebook (Scala), simply calling a method from the library Spark, Yarn cluster Log4j2 used in the library (configured to log on stdout) How…
bde.dev
  • 729
  • 9
  • 9
0
votes
1 answer

Spark Structured Streaming using spark-acid writeStream (with checkpoint) throwing org.apache.hadoop.fs.FileAlreadyExistsException

In our Spark app, we use Spark structured streaming. It uses Kafka as input stream, & HiveAcid as writeStream to Hive table. For HiveAcid, it is open source library called spark acid from qubole: https://github.com/qubole/spark-acid Below is our…
0
votes
1 answer

Avoid pre-signed URL expiry when IAM role key rotates

In Airflow I have 2 tasks defined that run every day: the first one creates a zip file and saves it in AWS under s3://{bucket-name}/foo/bar/{date}/archive.zip the second one pre-signs that url (should expire in 7 days) and sends it to…
Maria Livia
  • 75
  • 1
  • 9
0
votes
3 answers

How to query table partitions list using

I need to programmatically query Qubole for the list of partitions for a Hive table. I can do this by calling the correct API endpoint as described here, but I would like to use the qds-sdj-java client to do this (I am already using it for other…
GreenGiant
  • 4,930
  • 1
  • 46
  • 76
0
votes
1 answer

Qubole Presto datatype "Map" using the Like Operator

So I am trying to apply a simple like function for a Qubole query on Presto. For a string datatype I can simply do like '%United States of America%'. However for the column I am trying to apply this has the underlying datatype as "map" and thus…
pp2000
  • 35
  • 1
  • 2
  • 6
0
votes
1 answer

How to upgrade Python version on Qubole?

The current version on Qubole is 3.5.3, and some packages, like PyMC3 and future XGBoost need higher versions. How do I upgrade? And would that affect other clusters' settings? error message
HT.
  • 161
  • 1
  • 7
0
votes
1 answer

Unable to write or read from S3 bucket with Default AWS KMS encryption enabled

I am unable to read or write into a Default AWS KMS encrypted bucket without using the following configuration on my Qubole cluster fs.s3a.server-side-encryption-algorithm=SSE-KMS fs.s3a.server-side-encryption.key= But if I enable this…
0
votes
1 answer

Qubole Kinesis Connector for Spark structured streaming throws an error

We are using Qubole Kinesis Connector (jar) for Spark structured streaming. This used to work fine but suddenly, it is throwing an error "S3 filesystem not found". We could use the KCL but we need to test it for foreachbatch. Are there any other…
0
votes
2 answers

Rest api in testdrive account?

Hi I am using Qubole trial version and it is test drive account so I am not getting API Token from control panel my accounts tab in qubole is there a way to access REST API's Now? Thanks in Advance
sai Kumar
  • 43
  • 4
0
votes
2 answers

Running Scala jobs in Scheduler

My job runs fine in my notebook, but when I copy and paste the script into the Spark Scala scheduled job, I run into errors like "script.scala:15: error: not found: value sqlContext". What do I need to do to run my Scala code as scheduled job?
0
votes
1 answer

PySpark Machine Learning on Wide Data in Qubole

I have a large dataset, with roughly 250 features, that I would like to use in a gradient-boosted trees classifier. I have millions of observations, but I'm having trouble getting the model to work with even 1% of my data (~300k observations). Below…
ErrorJordan
  • 611
  • 5
  • 15
0
votes
1 answer

Setting up AWS Glue to crawl Qubole

Currently I work with Qubole to access Hive data. I've added metadata from several databases, and want to add all the Hive metadata to AWS Glue. Is this possible? Any help is appreciated.
Ash_s94
  • 787
  • 6
  • 19