python code to Unzip the zipped file in s3 server in databricks

Question

Code is to unzip the zipped file present in s3 server. Code is running in databricks , python version :3 and pandas===0.19.0

zip_ref = zipfile.ZipFile(path,mode='r') the above line throws error as below. FileNotFoundError: [Errno 2] No such file or directory: path

Please let me know why this line is throwing error, although the path is correct. OR Is there a way to read the contents in Zip folder, without extracting it.

check whats in 'path' , should be like `'s3://bucketname/filename.zip'`, dont forget the extension — Shijith, Apr 10 '19 at 13:23
Hi,Path is correct. I tried saving a file into the path ,it's successfuly working. — Sukanya, Apr 11 '19 at 06:55

score 0 · Answer 1 · answered Apr 11 '19 at 08:16

0

you can use

with zipfile.ZipFile("/dbfs/folder/file.zip", "r") as zip_ref:
    zip_ref.extractall("targetdir")

or the same code as above , avoid using ':' in the path string

answered Apr 11 '19 at 08:16

Shijith

4,602
2
20
34

Hi, I have tried by removing ':' as well. But no luck. – Sukanya Apr 11 '19 at 09:04

score 0 · Answer 2 · answered Dec 13 '21 at 01:38

Below is the code

### Declare the variables 
s3client = boto3.client('s3')  # s3 client (Boto3 is the AWS SDK for python)
s3resources = boto3.resource('s3') # s3 resource
filetype = '.zip' # filetype such as zip, csv, json
source_url = 's3://bucketname/' # s3 url with bucket name
bucketname = 'bucketname' # bucket name
zipfile_name = 'local_file' + filetype # folder name with file type in DataBricks
filename = 'zipfilename' + filetype # object key or filename with extn
shapefile_name = 'shapafilename.shp'  # extract file name with type from s3
shapefile_path = os.path.abspath(zipfile_name) #+ '/' + filename  # local filepath from the DB
os_CurDir_file = os.curdir + 'shapefiles'
### downloading the files from s3 to the local databricks
s3resources.Bucket(bucketname).download_file(filename, zipfile_name)   
### unzip the file in the local DB
with zipfile.ZipFile(shapefile_path, 'r') as zip_ref:
    zip_ref.extractall(os_CurDir_file)   
### import shapefile using geopandas
plot_locations_df = geopandas.read_file(
                          os.path.join(
                          os_CurDir_file, 
                          shapefile_name))
plot_locations_df['geometry'] = plot_locations_df.geometry.apply(lambda x: x.wkt).apply(lambda x: re.sub('"(.*)"', '\\1', x)) ### convert struct to string
display(plot_locations_df.head(5))

python code to Unzip the zipped file in s3 server in databricks

2 Answers2