Questions tagged [aws-databricks]

For questions about the usage of Databricks Lakehouse Platform on AWS cloud.

Databricks Lakehouse Platform on AWS

Lakehouse Platform for accelerating innovation across data science, data engineering, business analytics, and data warehousing integrated with your AWS infrastructure.

Reference: https://databricks.com/aws

190 questions
1
vote
1 answer

How to import text file in Data bricks

I am trying to write text file with some text and loading same text file in data-bricks but i am getting error Code #write a file to DBFS using Python I/O APIs with open("/dbfs/FileStore/tables/test_dbfs.txt", 'w') as f: f.write("Apache Spark is…
1
vote
1 answer

poetry publish from codebuild to aws codeartifact fails with UploadError

I have a dataset I need to periodically import to my datalake, replacing current dataset After I produce a dataframe I currently do: df.write.format("delta").save("dbfs:/mnt/defaultDatalake/datasets/datasources") But if I run the job again I get…
alonisser
  • 11,542
  • 21
  • 85
  • 139
1
vote
1 answer

How to access AWS public dataset using Databricks?

For one of my classes, I have to analyze a "big data" dataset. I found the following dataset on the AWS Registry of Open Data that seems interesting: https://registry.opendata.aws/openaq/ How exactly can I create a connection and load this dataset…
Aspire
  • 397
  • 1
  • 3
  • 9
1
vote
1 answer

Cannot import CSV file into h2o from Databricks cluster DBFS

I have successfully installed both h2o on my AWS Databricks cluster, and then successfully started the h2o server with: h2o.init() When I attempt to import the iris CSV file that is stored in my Databricks DBFS: train, valid =…
1
vote
1 answer

How to access the AWS public dataset using Databrick?

I am new to databricks. I am looking for public big data dataset for my school project, then I came across AWS public dataset on this link: https://registry.opendata.aws/target/ I am using python on Databricks, and I don't know how to establish a…
kimhkh
  • 27
  • 4
1
vote
1 answer

How can I set spark.task.maxFailures on AWS databricks?

I would like to set spark.task.maxFailures to value more than 4. Using Databricks 6.4 runtime, how can I set this value? When I execute spark.conf.get("spark.task.maxFailures"), I get below error java.util.NoSuchElementException:…
ravi malhotra
  • 703
  • 5
  • 14
0
votes
1 answer

Databricks Job API via Python "Run settings must be specified"

In databricks I've manually created a DAG job-of-jobs (task type Run job) that executes several sub-jobs. When I manually run it, it works well, and I can see it executing the sub-jobs to completion in the run. The issue is that I want to actually…
0
votes
1 answer

Error: Spark driver stopped unexpectedly due to memory

I have the below code where I need to reuse the flag from the previous day. So I am running the loop. I can't use the offset here as once I know the flag from the previous day then only I can use it for today. So, this loop runs 1000 times and after…
ASD
  • 25
  • 6
0
votes
1 answer

Read Kafka store file location from S3

We are getting below error so we started to get kafka key and certificate from S3 location (s3://my-bucket/tmp/k2/truststore.jks) on databricks notebook DbxDlTransferError: Terminated with exception: Kafka store file location only supports external…
0
votes
0 answers

How to transfer all the ML Scripts, models etc. from databricks to AWS Sagemaker

My databricks is hosted in AWS and i want to transfer all the notebooks, models etc from databricks to sagemaker. Can anyone tell me the procedure to be followed?
Harry1234
  • 21
  • 1
0
votes
1 answer

Cannot create a Metastore in Databricks

Followed all steps from https://www.youtube.com/watch?v=cylJ9hPmt7c , but still getting an error here is an example. Can't figure out why. Tried different regions also. I have account admin on databricks console and admin in aws, so it not the…
0
votes
2 answers

How to control databricks autoscaling from the driver node

I am using databricks for a specific workload. This workload involves an approx of 10 to 200 dataframes that are read and written to a storage location. This workload can benefit from parallelism. The constraint i have is cost optimization.…
0
votes
1 answer

Bigquery DATABRICKS CONNECTIVITY

How to access data from Big Query to dataframe in databricks using credentials in secrets. df = spark .read .format("bigquery") .option("credentialsFile",credentialfilepath) .option("parentProject",projectName) …
0
votes
0 answers

Databricks to ElasticSearch error [scala/Product$class]

When we are trying below code to push data from Databricks to ElasticSearch and we are getting below error. Elastic Search version 6.1.3 Databricks runtime version: Added below jars in cluster…
0
votes
1 answer

Databricks AWS compute cluster location

I have hosted Databricks on top of AWS, but I cannot see any EC2 instance created for Databricks. Can anyone explain me, if I create databricks using aws in my VPC, will the computation be created outside my AWS VPC? If yes, where will the…