I have a large number of .zip
files in a folder. They are DAISY files. I need to:
- open each ZIP archive,
- open up the only XML file in the archive,
- extract only the numbers from the same element, for example
<meta name="dc:Identifier" content="12345-eng-epb"/>
, - and then re-archive and rename the ZIP archive as that extracted number. For example, example
12345.zip
This is my code:
import os
import zipfile
import xml.etree.ElementTree as ET
# Enter the path to the folder containing the zip files
folder_path = r'C:\Users\XXXX\XXXXXXXXX\XXXXXX_Work\bookshare_DAISY\rename_test_XXXXX'
# Loop through each file in the folder
for file_name in os.listdir(folder_path):
if file_name.endswith('.zip'):
# Get the full path to the zip file
zip_file_path = os.path.join(folder_path, file_name)
# Extract the UID from the XML file in the zip file
with zipfile.ZipFile(zip_file_path, 'r') as zip_file:
for inner_file_name in zip_file.namelist():
if inner_file_name.endswith('.xml'):
with zip_file.open(inner_file_name) as inner_file:
tree = ET.parse(inner_file)
root = tree.getroot()
uid = root.find('.//dc:identifier', root.nsmap).text
# Remove any non-digit characters from the UID
uid_numbers = ''.join(filter(str.isdigit, uid))
# Generate the new zip file name and path
new_zip_file_name = f'{uid_numbers}.zip'
new_zip_file_path = os.path.join(folder_path, new_zip_file_name)
# Rename the zip file
os.rename(zip_file_path, new_zip_file_path)
print('Done!')
I keep running into this AttributeError
issue 'xml.etree.ElementTree.Element' object has no attribute 'nsmap'
, and I'm just not entirely sure what it means, or if I'm even really on the right track for this code to run properly.