Highest Voted 'aws-databricks' Questions

1

vote

1 answer

How can I get the S3 location of a Databricks DBFS path

I know my DBFS path is backed by S3. Is there any utility/function to get the exact S3 path from a DBFS path? For example, %python required_util('dbfs:/user/hive/warehouse/default.db/students') >> s3://data-lake-bucket-xyz/....... I was going…

asked Mar 23 '23 at 05:43

soumya-kole

1,111
7
18

1

vote

0 answers

How can we use service principal as user in Databricks SQL

If I want to run with service principles instead my user id in databricks sql is possible ?

amazon-web-services databricks-sql service-principal aws-databricks

asked Mar 22 '23 at 22:07

Karthikeyan Rasipalay Durairaj

1,920
13
35

1

vote

0 answers

DBX Databricks - installing private GitHub repositories on clusters in a workspace

I'm running code on Databricks clusters remotely using DBX - so my current directory is built into a wheel and then installed on the remote Databricks cluster. I'm having an issue where a private GitHub repo that I installed via poetry locally is…

python amazon-web-services git aws-databricks databricks-dbx

asked Mar 22 '23 at 01:49

flbzer

115
1
10

1

vote

1 answer

Aws Glue : Huge Databricks JDBC Dataset and pyspark parralelization

I'm using Databricks JDBC driver to get data from there using AWS Glue. The query returns 45M of rows. I'm using DynamicFrame to read the data and also to write it in parquet as a single file on S3. The problem is that the reading process seems to…

amazon-web-services pyspark databricks aws-glue aws-databricks

asked Mar 16 '23 at 07:44

Meyer Cohen

350
1
12

1

vote

1 answer

Set Workflow Job Concurrency Limit in Databricks

I need a job to be triggered every 5 minutes. However, if that job is already running, it must not be triggered again until that run is finished. Hence, I need to set the maximum run concurrency for that job to only one instance at a time. What…

databricks azure-databricks aws-databricks databricks-community-edition

asked Mar 15 '23 at 14:42

bda

372
1
7
22

1

vote

1 answer

List all widgets in a databricks notebook in python (Even those not overridden)

i would like to get the full list of widgets used in a notebook (even those not overridden). This thread example works fine if you run the notebook directly, but it won't if you run your notebook from a Databricks Job or Azure Data Factory. i.e : I…

python databricks azure-databricks aws-databricks

asked Mar 08 '23 at 16:03

Dylan

11
1

1

vote

1 answer

Error loading data from S3 bucket to Databricks External Table

Using an example I found online, below code throws error as it cannot read from S3 bucket. Problem is I have to pass in the AWS credentials which is found in variable S3_dir with the bucket path. I am unable to get this to work. %sql DROP TABLE IF…

python amazon-s3 pyspark delta-lake aws-databricks

asked Feb 24 '23 at 00:46

Shaggy

159
1
1
7

1

vote

1 answer

Databricks DLT pipeline Error "AnalysisException: Cannot redefine dataset"

I am getting this error "AnalysisException: Cannot redefine dataset" in my DLT pipeline. I am using a for loop to trigger multiple flows. I am trying to load different sources into the same target using dlt.create_target_table and dlt.apply_changes.…

databricks aws-databricks delta-live-tables

asked Feb 17 '23 at 18:06

BobGally

11
2

1

vote

1 answer

Schema Changes not Allowed on Delta Live Tables Full Refresh

I have a simple Delta Live Tables pipeline that performs a streaming read of multiple csv files from cloudFiles (s3 storage) into a delta table published to the hive metastore. I have two requirements that make my situation more complex/unique: I…

pyspark databricks hive-metastore aws-databricks delta-live-tables

asked Feb 08 '23 at 01:40

Kieran

771
7
14

1

vote

0 answers

Getting Error: Using PythonUDF in join condition of join type LeftSemi is not supported

I have a pypark.sql Dataframe which was created using an inner join of two data frames. I have also created one column after joining which provides week_start date based on the…

pyspark apache-spark-sql databricks aws-databricks

asked Feb 03 '23 at 21:04

ASD

25
6

1

vote

0 answers

Not able to configure cluster settings instance type using mlflow api 2.0 to enable model serving

I'm able to enable model serving by using the mlflow api 2.0 with the following code... instance = f'https://{workspace}.cloud.databricks.com' headers = {'Authorization': f'Bearer {api_workflow_access_token}'} # Enable Model…

python databricks aws-databricks

asked Feb 02 '23 at 22:00

spies006

2,867
2
19
28

1

vote

0 answers

databricks on AWS are not printing the value when run as a job

When I tried to run code as job in databricks with multiple print command ,job running successful without executing print commands and getting the below error. Failed to fetch the result.Retry

databricks aws-databricks

asked Jan 13 '23 at 05:18

Karthikeyan Rasipalay Durairaj

1,920
13
35

1

vote

1 answer

bring new data from csv file to delta table

I have created new table with csv file with following code %sql SET spark.databricks.delta.schema.autoMerge.enabled = true; create table if not exists catlog.schema.tablename; COPY INTO catlog.schema.tablename FROM (SELECT * FROM…

databricks azure-databricks aws-databricks

asked Jan 11 '23 at 23:33

patdev

11
1

1

vote

0 answers

How to know if the cache is loaded on databricks

I'm using Databricks cache with Reactjs in order to improve the performance when the app request something. But, how do I know when the cache is ready? Because when I run the SQL sentence, e.g CACHE SELECT * FROM table, doesn't return anything.…

sql reactjs caching databricks aws-databricks

asked Dec 14 '22 at 17:06

Ricardodrn

51
3

1

vote

0 answers

How to store a schema in file and in which file format for databricks autoloader?

I am using databricks autoloader. Here, the table schema will be dynamic for the incoming data. I have to store the schema in some file and read it in autoloader during readStream. How can I store the schema in a file and in which format? Whether…

databricks aws-databricks databricks-autoloader

asked Nov 17 '22 at 13:25

Thiru Balaji G

163
2
10

Questions tagged [aws-databricks]