Questions tagged [aws-databricks]

For questions about the usage of Databricks Lakehouse Platform on AWS cloud.

Databricks Lakehouse Platform on AWS

Lakehouse Platform for accelerating innovation across data science, data engineering, business analytics, and data warehousing integrated with your AWS infrastructure.

Reference: https://databricks.com/aws

190 questions
1
vote
1 answer

DLT: commas treated as part of column name

I am trying to create a STREAMING LIVE TABLE object in my DataBricks environment, using an S3 bucket with a bunch of CSV files as a source. The syntax I am using is: CREATE OR REFRESH STREAMING LIVE TABLE t1 COMMENT "test table" TBLPROPERTIES ( …
Piotr L
  • 1,065
  • 1
  • 12
  • 29
1
vote
1 answer

Databricks - Reduce delta version compute time

I've got a process which is really bogged down by the version computing for the target delta table. Little bit of context - there are other things that run, all contributing uniform structured dataframes that I want to persist in a delta table.…
Error_2646
  • 2,555
  • 1
  • 10
  • 22
1
vote
0 answers

Dataframe loses its contents after writing to database

We had working code as below. print(f"{file_name} Before insert count", datetime.datetime.now(), scan_df_new.count()) scan_df_new.show() scan_20220908120005_10 Before insert count 2022-09-14 11:37:15.853588…
1
vote
0 answers

Databricks Job fails with exception: UnsupportedOperationException: Not implemented by the CredentialScopeFileSystem FileSystem implementation

Some of my jobs are failing since I enabled Unity Catalog on a workspace with the following error: UnsupportedOperationException: Not implemented by the CredentialScopeFileSystem FileSystem implementation This is happening when I am trying to call…
1
vote
1 answer

Get cluster metric (Ganglia charts) of all clusters via REST API in Databricks

The question is specific to databricks. Is there any API to get the ganglia chart showing cluster usage? Need to get all the Ganglia charts that are available in the Databricks cluster metrics section for all the clusters via REST API calls. We are…
1
vote
1 answer

How do you use either Databricks Job Task parameters or Notebook variables to set the value of each other?

The goal is to be able to use 1 script to create different reports based on a filter. I want my Databricks Job Task parameters and Notebook variables to share the same value for filtering purposes. This is how I declared these widgets and stored in…
bnp21
  • 94
  • 1
  • 8
1
vote
1 answer

Unable to create a cluster in databricks on a customer managed vpc (private link enabled) - AWS

I’m trying to create a cluster in databricks in a customer managed vpc (AWS) environment. Created both front end and back end endpoints. The cluster got terminated with message ‘NPIP tunnel setup failure.’ Looking at the logs, it throws wait for…
dev_lite_s
  • 11
  • 1
1
vote
2 answers

Running local python code with arguments in Databricks via dbx utility

I am trying to execute a local PySpark script on a Databricks cluster via dbx utility to test how passing arguments to python works in Databricks when developing locally. However, the test arguments I am passing are not being read for some reason.…
1
vote
1 answer

Is there a way to find the databricks notebook which contain the Create table logic of hive_metastore tables

Is there a way to find the databricks notebook which contain the create logic if I know the Hive_metastore DB/tables name. I want to understand the logic and the reason I am trying to backtrack from table but the person who has create the table is…
1
vote
0 answers

Databricks SQL need to set weekofyear to use Sunday as day 1

In SQL, I can simply use: SET DATEFIRST 7 But I don't know how to do this in Databricks SQL. I am trying to calculate 'Relative Week' to current date and need to use Sunday as the week start day. My code is: %sql SELECT year(calendarDate) * 10000…
Michael W
  • 11
  • 2
1
vote
1 answer

Use remote driver with Databricks Connect

When connecting to Databricks cluster from local IDE, I take that only spark-related commands are executed in remote mode (on cluster). How about single-node operations such as scikit-learn, to_pandas. If these functions only use local machine, the…
1
vote
0 answers

How to set Spark configuration for Databricks SQL Endpoint

I know how to set Spark configuration in a regular Databricks compute cluster. But I didn't see any place to set it in Databricks SQL endpoint.
Haojin
  • 304
  • 3
  • 11
1
vote
1 answer

Databricks - Tag in job Cluster

I have a doubt regarding how to tag the cluster of the job cluster in databricks via api. I know that I can already tag a cluster and the job, but I wanted to tag the cluster of the cluster job, is this possible? I tried to use the "jobs/update"…
arodrber
  • 39
  • 6
1
vote
4 answers

IDE for Azure databricks

I am exploring data bricks and writing all code in azure databricks notebooks.I read about IDEs such as data connect, vscode, pycharm, intell j. In real time people use IDEs or most of time people use data bricks notebooks. Please advise. Regards,
1
vote
0 answers

Databricks: AWS Terraform resource to enable container service in Databricks worksapce

I am working on POC in terraform to bring up the Databricks workspace and cluster , now i am struck in a place where i need to create a container based cluster from the workspace but i am not finding the right document to enable same from Terraform…