Highest Voted 'aws-databricks' Questions

3

votes

1 answer

Load files in order with Databricks autoloader

I'm trying to write a python pipeline in Databricks to take CDC data from a postgres, dumped by DMS into s3 as parquet files and ingest it. The file names are numerically ascending unique ids based on datatime (ie 20220630-215325970.csv). Right now…

asked Jul 05 '22 at 16:57

B. Bogart

998
6
15

3

votes

1 answer

Azure Databricks Architecture - Communication between Control plane and data plane and authentications

I am trying to understand on Azure Databricks Architecture based on the this link. I could understand what is the purpose of control plane and data plane in Azure Databricks architecture.But I could't understand on the following questions . How…

databricks azure-databricks aws-databricks

asked Apr 30 '22 at 00:37

Karthikeyan Rasipalay Durairaj

1,920
13
35

3

votes

1 answer

StreamingQuery Delta Tables within Databricks - Describe History

I have a Delta Table which I am reading as StreamingQuery. Looking through the Delta Table History, using DESCRIBE History, I am seeing that 99% of the OperationMetrics states that numTargetRowsUpdates is 0 with most operations being Inserts.…

pyspark spark-streaming databricks delta-lake aws-databricks

asked Apr 14 '22 at 04:50

Trista_456

117
10

3

votes

2 answers

Databricks Notebook 8.3 (Apache Spark 3.1.1, Scala 2.12) | pyspark | Parquet write exception | Multiple failures in stage materialization

This is a Production code running fine until last week. Then, this parquet write error showed up and never getting resolved. While writing to AWS S3 in parquet format, I tried several dataframe.repartitions(300) - 300, 500, 2400, 6000. But no luck.…

apache-spark pyspark databricks feature-engineering aws-databricks

asked Dec 27 '21 at 02:49

Michelle_G

33
1
4

3

votes

1 answer

How to configure a custom Spark Plugin in Databricks?

How to properly configure Spark plugin and the jar containing the Spark Plugin class in Databricks? I created the following Spark 3 Plugin class in Scala, CustomExecSparkPlugin.scala: package example import org.apache.spark.api.plugin.{SparkPlugin,…

apache-spark apache-spark-sql databricks azure-databricks aws-databricks

asked Nov 03 '21 at 10:52

FRG96

151
1
9

3

votes

3 answers

Aws S3 to Databricks mount is not working

I have mounted 'mybucket' using mount commands and i could able to list all the objects using the below command- %fs ls /mnt/mybucket/ however, i have folders inside the folders in 'mybucket' and i want to run the below command but it is not…

databricks aws-databricks

asked Jun 07 '21 at 19:30

Kiran Annamaneni

91
1
5

3

votes

0 answers

Databricks: Difference between dbfs:/ vs file:/

I am trying to understand the way Databricks stores files and I am a bit unsure of what the difference is between dbfs:/ and file:/ (see image below) From what I have been able to deduce from here, file:/ seems to be the area where external files…

databricks azure-databricks aws-databricks

asked Aug 31 '20 at 08:41

Neal

328
5
12

3

votes

3 answers

Can't Access /dbfs/FileStore using shell commands in databricks runtime version 7

In databricks runtime version 6.6 I am able to successfully run a shell command like the following: %sh ls /dbfs/FileStore/tables However, in runtime version 7, this no longer works. Is there any way to directly access /dbfs/FileStore in runtime…

databricks azure-databricks aws-databricks databricks-community-edition

asked Aug 27 '20 at 02:41

Willard

502
8
21

2

votes

1 answer

Can we execute a single task in isolation from a multi task Databricks job

Can we execute a single task in isolation from a multi-task Databricks job?

databricks azure-databricks aws-databricks

asked Mar 23 '23 at 19:27

soumya-kole

1,111
7
18

2

votes

1 answer

Cross Job Dependencies in Databricks Workflow

I am trying to create a data pipeline in Databricks using Workflows UI. I have significant number of tasks which I wanted to split across multiple jobs and have dependencies defined across them. But it seems like, in Databricks there cannot be cross…

databricks aws-databricks

asked Feb 10 '23 at 15:18

Abhishek

83
10

2

votes

1 answer

Using code_path in mlflow.pyfunc models on Databricks

We are using Databricks over AWS infra, registering models on mlflow. We write our in-project imports as from src.(module location) import (objects). Following examples online, I expected that when I use mlflow.pyfunc.log_model(...,…

databricks mlflow aws-databricks

asked Feb 06 '23 at 08:58

perfects

21
2

2

votes

1 answer

Databricks how to exit the entire 'job' in the notebooks orchestration workflow?

Say I have a simple notebook orchestration : Notebook A -> Notebook B Notebook A finish first then trigger Notebook B I am wondering if there is an out of box method to allow Notebook A to terminate the entire job? (without running Notebook…

databricks azure-databricks aws-databricks

asked Aug 11 '22 at 00:27

QPeiran

1,108
1
8
18

2

votes

1 answer

Where does the databricks cluster runs when I created a cluster through UI in databricks?

I am new to databricks, I am confused after creating a cluster in databricks. Here databricks asked me to connect AWS account before creating a workspace and I did. Then I created a cluster. Now I want to know that, where does the cluster runs. Is…

amazon-web-services apache-spark databricks aws-databricks

asked Aug 09 '22 at 17:44

Hariharan B

31
4

2

votes

3 answers

AWS Databricks pricing - should we also pay for EC2 instances seperately, in addition to DBU costs?

am trying to do some cost comparison between AWS Glue and Databricks hosted on an AWS environment. For the comparison, I have chosen m4.xlarge which is equivalent of 1 DPU in AWS Glue (4 vCPUs/16GB memory). Assuming I have an pyspark job thats…

amazon-web-services databricks aws-glue aws-databricks

asked May 12 '22 at 10:25

Yuva

2,831
7
36
60

2

votes

2 answers

AWS Databricks cluster start failure

I am currently unable to spin up any clusters in our databricks AWS environment. When I attempt to start up an on-demand cluster, it remains in "pending" for 20+ minutes (on relatively small clusters which usually take 2-3 min to start…

databricks aws-databricks

asked Apr 20 '22 at 15:03

wylie

173
11

Questions tagged [aws-databricks]