Questions tagged [aws-databricks]

For questions about the usage of Databricks Lakehouse Platform on AWS cloud.

Databricks Lakehouse Platform on AWS

Lakehouse Platform for accelerating innovation across data science, data engineering, business analytics, and data warehousing integrated with your AWS infrastructure.

Reference: https://databricks.com/aws

190 questions
1
vote
1 answer

How can I get the S3 location of a Databricks DBFS path

I know my DBFS path is backed by S3. Is there any utility/function to get the exact S3 path from a DBFS path? For example, %python required_util('dbfs:/user/hive/warehouse/default.db/students') >> s3://data-lake-bucket-xyz/....... I was going…
1
vote
0 answers

How can we use service principal as user in Databricks SQL

If I want to run with service principles instead my user id in databricks sql is possible ?
1
vote
0 answers

DBX Databricks - installing private GitHub repositories on clusters in a workspace

I'm running code on Databricks clusters remotely using DBX - so my current directory is built into a wheel and then installed on the remote Databricks cluster. I'm having an issue where a private GitHub repo that I installed via poetry locally is…
1
vote
1 answer

Aws Glue : Huge Databricks JDBC Dataset and pyspark parralelization

I'm using Databricks JDBC driver to get data from there using AWS Glue. The query returns 45M of rows. I'm using DynamicFrame to read the data and also to write it in parquet as a single file on S3. The problem is that the reading process seems to…
1
vote
1 answer

Set Workflow Job Concurrency Limit in Databricks

I need a job to be triggered every 5 minutes. However, if that job is already running, it must not be triggered again until that run is finished. Hence, I need to set the maximum run concurrency for that job to only one instance at a time. What…
1
vote
1 answer

List all widgets in a databricks notebook in python (Even those not overridden)

i would like to get the full list of widgets used in a notebook (even those not overridden). This thread example works fine if you run the notebook directly, but it won't if you run your notebook from a Databricks Job or Azure Data Factory. i.e : I…
Dylan
  • 11
  • 1
1
vote
1 answer

Error loading data from S3 bucket to Databricks External Table

Using an example I found online, below code throws error as it cannot read from S3 bucket. Problem is I have to pass in the AWS credentials which is found in variable S3_dir with the bucket path. I am unable to get this to work. %sql DROP TABLE IF…
Shaggy
  • 159
  • 1
  • 1
  • 7
1
vote
1 answer

Databricks DLT pipeline Error "AnalysisException: Cannot redefine dataset"

I am getting this error "AnalysisException: Cannot redefine dataset" in my DLT pipeline. I am using a for loop to trigger multiple flows. I am trying to load different sources into the same target using dlt.create_target_table and dlt.apply_changes.…
BobGally
  • 11
  • 2
1
vote
1 answer

Schema Changes not Allowed on Delta Live Tables Full Refresh

I have a simple Delta Live Tables pipeline that performs a streaming read of multiple csv files from cloudFiles (s3 storage) into a delta table published to the hive metastore. I have two requirements that make my situation more complex/unique: I…
1
vote
0 answers

Getting Error: Using PythonUDF in join condition of join type LeftSemi is not supported

I have a pypark.sql Dataframe which was created using an inner join of two data frames. I have also created one column after joining which provides week_start date based on the…
ASD
  • 25
  • 6
1
vote
0 answers

Not able to configure cluster settings instance type using mlflow api 2.0 to enable model serving

I'm able to enable model serving by using the mlflow api 2.0 with the following code... instance = f'https://{workspace}.cloud.databricks.com' headers = {'Authorization': f'Bearer {api_workflow_access_token}'} # Enable Model…
spies006
  • 2,867
  • 2
  • 19
  • 28
1
vote
0 answers

databricks on AWS are not printing the value when run as a job

When I tried to run code as job in databricks with multiple print command ,job running successful without executing print commands and getting the below error. Failed to fetch the result.Retry
1
vote
1 answer

bring new data from csv file to delta table

I have created new table with csv file with following code %sql SET spark.databricks.delta.schema.autoMerge.enabled = true; create table if not exists catlog.schema.tablename; COPY INTO catlog.schema.tablename FROM (SELECT * FROM…
patdev
  • 11
  • 1
1
vote
0 answers

How to know if the cache is loaded on databricks

I'm using Databricks cache with Reactjs in order to improve the performance when the app request something. But, how do I know when the cache is ready? Because when I run the SQL sentence, e.g CACHE SELECT * FROM table, doesn't return anything.…
Ricardodrn
  • 51
  • 3
1
vote
0 answers

How to store a schema in file and in which file format for databricks autoloader?

I am using databricks autoloader. Here, the table schema will be dynamic for the incoming data. I have to store the schema in some file and read it in autoloader during readStream. How can I store the schema in a file and in which format? Whether…