1

Using an example I found online, below code throws error as it cannot read from S3 bucket. Problem is I have to pass in the AWS credentials which is found in variable S3_dir with the bucket path. I am unable to get this to work.

%sql
DROP TABLE IF EXISTS mydb.bigtable;
CREATE TABLE mydb.bigtable
(
 id BIGINT,
  string1 STRING,
  numbers BIGINT,
) USING delta
LOCATION '/user/hive/warehouse/mydb'
SELECT * from delta.f"{S3_dir}";
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Shaggy
  • 159
  • 1
  • 1
  • 7
  • What is the specific problem? Is it the desire to pass-in the directory name, or are you have problems accessing data from S3? If you hard-coded that directory name, does it work? Also, where are you running this command -- it would appear to be from a Databricks Notebook? – John Rotenstein Feb 24 '23 at 01:35

1 Answers1

1

In-order to use any data source from a storage(such as AWS S3, Azure ADLS), you need to mount the s3 bucket with Databricks. Here is the step by step procedure:

Step 1: Create AWS Access Key and Secret Key for Databricks

  • Log into your AWS account
  • Select security credentials -> under your profile name [top right corner]
  • Look for Access keys (Scroll down) -> Create and download the generate CSV file

Note: This file contains your access and secret key

Step 2: Login to Databricks and run the following code

Create a notebook called “mount_s3_dbfs”

import urllib

# 1. Add necessary information
ACCESS_KEY = <access_key>
SECRET_KEY = <secret_key>
S3_BUCKET = <s3bucket_name>

# 2. Encode your secret key
ENCODED_SECRET_KEY = urllib.parse.quote(string=SECRET_KEY, safe="")

# 3. Defining URL
URL = "s3a://{}:{}@{}".format(ACCESS_KEY, ENCODED_SECRET_KEY, S3_BUCKET)

# 4. Mounting your S3 Bucket with DBFS
dbutils.fs.mount(URL,f"/mnt/{S3_BUCKET}")

# 5. Print mounted folder content
dbutils.fs.ls(URL,f"/mnt/{S3_BUCKET}/")

Note: Providing your credential directly is not recommended, try uploading it into your DBFS and read it using spark read csv method.

Step 3: Use spark to read and write file from AWS

%sql

DROP TABLE IF EXISTS mydb.bigtable;
CREATE TABLE mydb.bigtable
(
 id BIGINT,
  string1 STRING,
  numbers BIGINT,
) USING delta
LOCATION '/user/hive/warehouse/mydb'
SELECT * FROM delta.`path\to\s3file`;

Note: Use '`' (backtick) to define your s3 file location.

arudsekaberne
  • 830
  • 4
  • 11