2

Code is to unzip the zipped file present in s3 server. Code is running in databricks , python version :3 and pandas===0.19.0

zip_ref = zipfile.ZipFile(path,mode='r') the above line throws error as below. FileNotFoundError: [Errno 2] No such file or directory: path

Please let me know why this line is throwing error, although the path is correct. OR Is there a way to read the contents in Zip folder, without extracting it.

Sukanya
  • 33
  • 1
  • 1
  • 9

2 Answers2

0

you can use

with zipfile.ZipFile("/dbfs/folder/file.zip", "r") as zip_ref:
    zip_ref.extractall("targetdir")  

or the same code as above , avoid using ':' in the path string

Shijith
  • 4,602
  • 2
  • 20
  • 34
0
Below is the code

### Declare the variables 
s3client = boto3.client('s3')  # s3 client (Boto3 is the AWS SDK for python)
s3resources = boto3.resource('s3') # s3 resource
filetype = '.zip' # filetype such as zip, csv, json
source_url = 's3://bucketname/' # s3 url with bucket name
bucketname = 'bucketname' # bucket name
zipfile_name = 'local_file' + filetype # folder name with file type in DataBricks
filename = 'zipfilename' + filetype # object key or filename with extn
shapefile_name = 'shapafilename.shp'  # extract file name with type from s3
shapefile_path = os.path.abspath(zipfile_name) #+ '/' + filename  # local filepath from the DB
os_CurDir_file = os.curdir + 'shapefiles'
### downloading the files from s3 to the local databricks
s3resources.Bucket(bucketname).download_file(filename, zipfile_name)   
### unzip the file in the local DB
with zipfile.ZipFile(shapefile_path, 'r') as zip_ref:
    zip_ref.extractall(os_CurDir_file)   
### import shapefile using geopandas
plot_locations_df = geopandas.read_file(
                          os.path.join(
                          os_CurDir_file, 
                          shapefile_name))
plot_locations_df['geometry'] = plot_locations_df.geometry.apply(lambda x: x.wkt).apply(lambda x: re.sub('"(.*)"', '\\1', x)) ### convert struct to string
display(plot_locations_df.head(5))