Highest Voted 'aws-databricks' Questions

2

votes

1 answer

specify a database name in databricks sql connection parameters

I am using airflow 2.0.2 to connect with databricks using the airflow-databricks-operator. The SQL Operator doesn't let me specify the database where the query should be executed, so I have to prefix the table_name with database_name. I tried…

airflow databricks aws-databricks

asked Apr 16 '22 at 02:02

Pbd

1,219
1
15
32

2

votes

0 answers

How to set custom path for databricks mlflow artifacts on s3

I've created an empty experiments from databricks experiments console and given the path for my artifacts on s3 i.e. s3:///. When i run the scripts, the artifacts are stored at s3:////<32 char id>/artifacts/model-Elasticnet/model.pkl I want…

databricks mlflow aws-databricks mlops

asked Apr 04 '22 at 14:12

shahidammer

1,026
2
10
24

2

votes

1 answer

Data Lakes - S3 and Databricks

I understand Data Lake Zones in S3 and I am looking at establishing 3 zones - LANDING, STAGING, CURATED. If I were in an Azure environment, I would create the Data Lake and have multiple folders as various zones. How would I do the equivalent in AWS…

amazon-web-services amazon-s3 databricks aws-databricks

asked Mar 08 '22 at 05:05

Trista_456

117
10

2

votes

1 answer

Why would dataframe.write.mode("overwrite").saveAsTable("table") command be dropping data?

%python dataframe.count() #output 1179 %python dataframe.write.mode("overwrite").saveAsTable("tablename") %sql select count(*) from tablename --output 1069 What can I be doing wrong? (these are different cells in databricks) I want to…

apache-spark pyspark databricks aws-databricks

asked Feb 15 '22 at 16:22

proutray

1,943
3
30
48

2

votes

1 answer

What is expected input date pattern for date_format function in databricks spark SQL

I am trying to better understand the date_format function offered by Spark SQL.As per the official databricks documentation (I am using databricks), this function expects any date/ string in a valid datetime format. Below is the link for the…

apache-spark-sql databricks azure-databricks aws-databricks

asked Sep 14 '21 at 19:21

halfwind22

329
4
18

2

votes

0 answers

Databricks Spark throws java.io.NotSerializableException: com.amazonaws.services.s3.AmazonS3Client

Hi I am trying to run the following code on Databricks which is a 3 node spark cluster I retrieve the data from kinesis stream into a spark dataframe and transform it to extract the payload json file name In the below code I am trying to download…

apache-spark amazon-s3 databricks aws-databricks

asked Aug 16 '21 at 03:12

Pradnya Alchetti

165
3
12

2

votes

1 answer

Not able to display charts in Databricks when using a loop (not at end of cell)

I'm using a Databricks notebook. For various reasons, I need to render charts individually (concat doesn't give me the results I want) and I can't put the chart object at the end of the cell. I want to render each chart and do some processing.…

databricks altair aws-databricks

asked Jun 21 '21 at 21:38

Mike Woodward

211
2
10

2

votes

1 answer

Installing c libraries needed for R spatial packages on databricks clusters

Spatial packages in R often depend on C libraries for their numerical computation. This presents a problem when installing R packages that depend on these libraries if the R engine is unable to install these libraries using default permissions. It…

r databricks gdal geos aws-databricks

asked Feb 11 '21 at 00:33

Cyrus Mohammadian

4,982
6
33
62

2

votes

2 answers

speeding up heavily partitioned dataframe to s3 on databricks

I'm running a notebook on Databricks which creates partitioned PySpark data frames and uploads them to s3. The table in question has ~5,000 files and is ~5 GB in total size (it needs to be partitioned in this way to be effectively queried by…

python amazon-s3 pyspark databricks aws-databricks

asked Aug 24 '20 at 14:56

fez

1,726
3
21
31

1

vote

1 answer

Move managed DLT table from one schema to another schema in Databricks

I have a DLT table in schema A which is being loaded by DLT pipeline. I want to move the table from schema A to schema B, and repoint my existing DLT pipeline to table in schema B. also I need to avoid full reload in DLT pipeline on table in Schema…

databricks delta-lake aws-databricks delta-live-tables

asked Aug 09 '23 at 16:18

Athi

347
4
12

1

vote

0 answers

Using private python packages with databricks model serving

I am attempting to host a Python MLflow model using Databricks model serving. While the serving endpoint functions correctly without private Python packages, I am encountering difficulties when attempting to include them. Context: Without Private…

databricks mlflow aws-databricks

asked Aug 03 '23 at 19:28

Eric

795
5
21

1

vote

2 answers

How to dynamically change variables in a Databricks notebook based on to which environment was it deployed?

I want to move data from S3 bucket to Databricks. On both platforms I have separate environments for DEV, QA, and PROD. I use a Databricks notebook which I deploy to Databricks using terraform. Within the notebook there are some hardcoded variables,…

terraform databricks aws-databricks terraform-provider-databricks

asked Jul 25 '23 at 11:33

wookash

11
4

1

vote

2 answers

pyspark filtering column values using endswith

Hi I'm trying to filter some values of a column in a table using a function "endswith". The table looks like…

pyspark databricks azure-databricks aws-databricks

asked May 16 '23 at 19:35

MMV

164
10

1

vote

1 answer

Error: cannot read mws workspaces: RESOURCE_DOES_NOT_EXIST: workspace 96783599 does not exist

When I do terraform apply. My workspace is getting created but I am getting the following error. I have looked to find the "workspace 96783599". But uanble to find the any resource with that number. Error: cannot read mws workspaces:…

databricks terraform-provider-aws aws-databricks terraform-provider-databricks

asked May 05 '23 at 07:42

dheeraj G

13
3

1

vote

1 answer

Show table with multiple conditions in Databricks

I want to find tables in my databricks database that meet more than one condition. Mysql allows 'where' clauses to include multiple conditions like this post explains. To use multiple conditions in databricks, I can use the following syntax, but…

sql conditional-statements databricks aws-databricks

asked Mar 28 '23 at 14:45

SunflowerParty

39
6

Questions tagged [aws-databricks]