Questions tagged [aws-databricks]

For questions about the usage of Databricks Lakehouse Platform on AWS cloud.

Databricks Lakehouse Platform on AWS

Lakehouse Platform for accelerating innovation across data science, data engineering, business analytics, and data warehousing integrated with your AWS infrastructure.

Reference: https://databricks.com/aws

190 questions
0
votes
1 answer

Databricks -Cannot create table The associated location is not empty and also not a Delta table

I am getting the error: Cannot create table ('hive_metastore.MY_SCHEMA.MY_TABLE'). The associated location ('dbfs:/user/hive/warehouse/my_schema.db/my_table') is not empty and also not a Delta table. I tried to overcome this by running drop table…
0
votes
0 answers

Spark ganglia report not matching databrick's cluster specifications

I have a databricks cluster on AWS, with minimum two nodes and maximum 8. Here's a picture of my cluster I have cached a dataframe, and under SparkUI on storage tab I see it's 6.7 GB So I would expect that if I go to ganglia's UI, I would see that…
0
votes
0 answers

Unable to convert data to micro seconds in Databricks SQL

I have a requirement to convert a string to a timestamp, specifically in microsecond format. However, I am currently unable to convert it to microsecond precision. I can only convert the data up to millisecond precision. %sql select…
0
votes
1 answer

Databricks change default catalog

It seems that when I am connecting to Databricks Warehouse, it is using the default catalog which is hive_metastore. Is there a way to define unity catalog to be the default? I know I can run the query USE CATALOG MAIN And then the current session…
Gilo
  • 640
  • 3
  • 23
0
votes
1 answer

Convert Databricks notebook to .py file in workspace

The actual problem I'm trying to solve is that I'm using mkdocs/mkdocs-materials for my documentation. But that tool can't work with notebook type files. So as a clumsy workaround I'm figuring is to have an intermediate step that creates a copy of…
Error_2646
  • 2,555
  • 1
  • 10
  • 22
0
votes
2 answers

Keep partition number reasonable, but partition dataframe such that values of a high cardinality column are in same partition

Tagging "sql" too because an answer that derives a column to partition on with sparkSql would be fine. Summary: Say I have 3B distinct values of AlmostUID. I don't want 3B partitions, say I want 1000 partitions. But I want all like values of…
Error_2646
  • 2,555
  • 1
  • 10
  • 22
0
votes
1 answer

Unable to insert data in Postgres using Jdbc

I am attempting to insert data into a PostgreSQL database using PySpark with JDBC. However, during the data insertion process, it is unexpectedly attempting to recreate the table and producing the following output. org.postgresql.util.PSQLException:…
0
votes
1 answer

Databricks model deployment to AWS Sagemaker -- No module named docker error

I am trying to deploy a dummy model to AWS Sagemaker using Databricks and MLflow. According to this documentation, it builds a new MLflow Sagemaker image, assigns it a name, and push to ECR. However, when I run the following lines of code in a…
0
votes
0 answers

Unable to read Hudi file in Spark Databricks Environment

I am facing this error while running Spark in Databricks. I am trying to read Hudi file format. I’m using Hudi 0.13.0 with Databricks (12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) Trying to load a hudi data set from S3 but failed with this…
0
votes
0 answers

RuntimeError: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation DataBricks error

I created the following model: class EquipmentEmbeddingEndpoint(mlflow.pyfunc.PythonModel): def load_context(self, context): self.identifiers_df = get_identifier_information() def predict(self, context, model_input): …
nikhil
  • 1
  • 1
0
votes
0 answers

Replicating Table "Promotion" in Databricks w/ S3 Backend

In my experience with DBMS systems, one safe approach to promoting new datasets for business intelligence is to: Apply updates to a staging table table_stg Validate the staging table updates against production table table_prod If pass, rename…
0
votes
0 answers

AWS Databricks cluster not starting : Failed to init

Bootstrap Timeout: [id: InstanceId(i-04bd85c1b17328b96), status: INSTANCE_INITIALIZING, workerEnvId:WorkerEnvId(workerenv-2273350004125125-ce5d7a3b-cda1-4494-aa7f-c1bcea16ce04), lastStatusChangeTime: 1686032804843, groupIdOpt Some(0),requestIdOpt…
0
votes
0 answers

Dropping columns from a nested array with root level array in PySpark - Databricks

How can I drop columns from a nested array in a PySpark dataframe that has an array at the root level in databricks? I was able to drop columns from an array within a struct. But not finding a way within a nested array. Not getting a solution
ic2019
  • 1
  • 1
  • 2
0
votes
0 answers

Column contain special charater, How to create view

I have a table which has a special character in column how to create a view create or replace view view1 as select phone# from phonebook this create statement not working in AWS databricks
Salman
  • 3
  • 2
0
votes
0 answers

How to get the AWS data-bricks cluster health metrics using api

I wanted to get some details (like memory usage, CPU) of data-bricks cluster using api. We have ganglia UI but I need to use api to get some customize metrics
PB22
  • 31
  • 4