Questions tagged [aws-databricks]

For questions about the usage of Databricks Lakehouse Platform on AWS cloud.

Databricks Lakehouse Platform on AWS

Lakehouse Platform for accelerating innovation across data science, data engineering, business analytics, and data warehousing integrated with your AWS infrastructure.

Reference: https://databricks.com/aws

190 questions
0
votes
1 answer

Do we have any spark libraries to connect from databricks to opensearch

`While using elastic search library "org.elasticsearch:elasticsearch-spark-30_2.12:7.13.3" which is working fine if target is elastic search 7.10 but with opensearch 2.3 as target it is giving issue like mapping parser exception. Basically while…
0
votes
0 answers

Can I find remaining 85-90 columns which was not provided to me? in databricks

I have a table in Databricks which has almost 100's columns. I was given with some 10-15 columns, can I find remaining 85-90 columns which was not provided to me? For example, Table 'A has column name like (a,b,c,d,e,f,g,h,....z), I was given with…
Salman
  • 3
  • 2
0
votes
1 answer

Configure Amazon maximum percentage of OnDemand price (spot instances)

I'm playing a little with spot instances, and for example, in Databricks, I can ask for a spot instance with a minimum of % savings over On-Demand instances. My question is, if I set 90% off the On-Demand instance and the current price is 50%, I…
Alejandro
  • 519
  • 1
  • 6
  • 32
0
votes
1 answer

Send emails from Azure Databricks

I would like to send emails from Azure Databricks. I try to do this: https://docs.databricks.com/_static/notebooks/kb/notebooks/send-email-aws.html But when I execute this: send_email(from_addr, to_addrs, subject, html,…
0
votes
1 answer

Previous month query - Databricks

I try to find a function where I can extract the result of the last month only (for exemple if I launch the query in november, I want to display only the resultat of october) There the result : I dont know if I have to enter the function in my…
0
votes
0 answers

Cook's distance in Pyspark

I wanted to use cooks distance to remove the outlier from my dataset for regression. But I am not able to find any method to do so in pyspark. I know how we can do it in python using get_influence() method. is there any similar method in pyspark?
0
votes
1 answer

Terraform + Databricks error ENDPOINT_NOT_FOUND: Unsupported path:

I am wondering if someone already encountered this error I am getting when trying to create OBO Tokens for Databricks Service Principals. When setting up the databricks_permissions I get: Error: ENDPOINT_NOT_FOUND: Unsupported path:…
0
votes
0 answers

Spark Memory Management Calculation

I am new in Spark application. I am using r5a.4xlarge aws cluster with min worker is 1 and max worker is 16. This instance has 128GB memory and 16 cores. I have used spark.executor.cores 5. As per the memory management calculation memory/ executor…
0
votes
0 answers

How to successfully execute a stored procedure in Databricks versions higher than 7.3

Databricks will soon be dropping support for their 7.3 LTS runtime. Unfortunately, not all the functionality (that we require) appears to be easy to replicate in later runtimes. The main sticking point that we've ran across so far is forming SQL…
0
votes
0 answers

java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI Error in Databricks

I am getting this error on the Community Edition of Databricks when trying to make a graph with the GraphFrame() function. java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI enter image description here I have tried a few…
0
votes
1 answer

Error cluster launch: Security Daemon Registration

I have created a workspace in AWS Databricks with private link. When we launch a cluster we get the following error: Security Daemon Registration Exception: Failed to set up the spark container due to an error when registering the container to…
0
votes
2 answers

Left Joining after Case Statment SQL

I have two tables, A and B In the table A there's one column with a Full Name called EmployeeName, on the table B there's also one column with the name OrigFullName, the thing is the column EmployeeName don't follow a standard, sometimes there's…
sohrenan
  • 11
  • 2
0
votes
0 answers

Pyspark DataFrame - Discretize the selected numerical column and then apply groupby and crosstab function

I have dataframe which has 100+ numerical columns. I want to descretize some columns from it and then apply groupby function and crosstab function on these discretized columns. Currently, I am using a loop to iterate over all selected numerical…
ASD
  • 25
  • 6
0
votes
0 answers

Databricks with CloudWatch metrics without Instanceid dimension

I have jobs running on job clusters. And I want to send metrics to the CloudWatch. I set CW agent followed this guide. But issue is that I can't create useful metrics dashboard and alarms because I always have InstanceId dimension, and InstanceId is…
CoyoteKG
  • 45
  • 5
0
votes
2 answers

Count function on dataBricks provide different output every time I run the code

I am new to data bricks and working on pyspark dataframe. In my code, I have join the two dataframe by using join function and then I use the count function to get the count of new dataframe. Then I sort the dataframe by using orderby function and…
ASD
  • 25
  • 6