0

I have a large number of .zip files in a folder. They are DAISY files. I need to:

  • open each ZIP archive,
  • open up the only XML file in the archive,
  • extract only the numbers from the same element, for example <meta name="dc:Identifier" content="12345-eng-epb"/>,
  • and then re-archive and rename the ZIP archive as that extracted number. For example, example 12345.zip

This is my code:

import os
import zipfile
import xml.etree.ElementTree as ET

# Enter the path to the folder containing the zip files
folder_path = r'C:\Users\XXXX\XXXXXXXXX\XXXXXX_Work\bookshare_DAISY\rename_test_XXXXX'

# Loop through each file in the folder
for file_name in os.listdir(folder_path):
    if file_name.endswith('.zip'):
        # Get the full path to the zip file
        zip_file_path = os.path.join(folder_path, file_name)

        # Extract the UID from the XML file in the zip file
        with zipfile.ZipFile(zip_file_path, 'r') as zip_file:
            for inner_file_name in zip_file.namelist():
                if inner_file_name.endswith('.xml'):
                    with zip_file.open(inner_file_name) as inner_file:
                        tree = ET.parse(inner_file)
                        root = tree.getroot()
                        uid = root.find('.//dc:identifier', root.nsmap).text

        # Remove any non-digit characters from the UID
        uid_numbers = ''.join(filter(str.isdigit, uid))

        # Generate the new zip file name and path
        new_zip_file_name = f'{uid_numbers}.zip'
        new_zip_file_path = os.path.join(folder_path, new_zip_file_name)

        # Rename the zip file
        os.rename(zip_file_path, new_zip_file_path)

print('Done!')

I keep running into this AttributeError issue 'xml.etree.ElementTree.Element' object has no attribute 'nsmap', and I'm just not entirely sure what it means, or if I'm even really on the right track for this code to run properly.

Michael M.
  • 10,486
  • 9
  • 18
  • 34
  • 1
    What are you expecting the `nsmap` attribute to be? Did you copy this code from an example somewhere? – John Gordon Feb 22 '23 at 23:23
  • There's only one place in the posted code where you reference `nsmap`, which is `uid = root.find('.//dc:identifier', root.nsmap).text` The error message is telling you that `root` has type `xml.etree.ElementTree.Element`, and that it has no attribute named `nsmap`. – Tom Karzes Feb 22 '23 at 23:25
  • https://docs.python.org/3/library/xml.etree.elementtree.html#parsing-xml-with-namespaces – LMC Feb 22 '23 at 23:33
  • 1
    `nsmap` does not exist in ElementTree. [tag:lxml] however provides this property on elements: https://lxml.de/apidoc/lxml.etree.html#lxml.etree._Element.nsmap – mzjn Feb 23 '23 at 12:33

2 Answers2

1

The native Python implementation of ElementTree does not have such an attribute. You might consider switching to the lxml version, which is the module that wherever you got the .nsmap reference in your code was probably using.

To aid with this task of switching over, lxml even provides a handy page on the compatibility of lxml.etree with other ElementTree implementations. After making sure lxml is installed, it should be as simple as changing your import xml.etree.ElementTree as ET line to import lxml.etree as ET, and the API should be backward-compatible.

From the compatibility page I linked above, we can see that in addition to supporting the .nsmap attribute,

lxml.etree offers a lot more functionality, such as XPath, XSLT, Relax NG, and XML Schema support, which (c)ElementTree does not offer.

and is therefore generally preferable to the built-in xml.etree for anything remotely sophisticated.

L0tad
  • 574
  • 3
  • 15
0

I would rename and copy after parsing with the build in xml.etree.ElementTree to a new folder. The different namespaces can be searched with root.find(".//{*}identifier").text:

import os
import zipfile
import shutil
import xml.etree.ElementTree as ET

# List the ZIP filess
zip_files_list = []
for root, dirs, files in os.walk(r".\archive"):
    for file in files:
        if file.endswith(".zip"):
            zip_files_list.append(os.path.join(root, file))
print(zip_files_list)

# Read the XML File into ZIP
xml_file_list = []
for zip_file in zip_files_list:
    with zipfile.ZipFile(zip_file, mode="r") as f:
        for files_list in f.namelist():
            xml_file_list.append(files_list)
# Parse and copy renamed ZIP to the new directory
            with f.open(files_list, mode='r') as thefile:
                tree = ET.parse(thefile)
                root = tree.getroot()
                file_id = root.find(".//{*}identifier").text
                print(root.find(".//{*}identifier").text)
                src= zip_file
                dst= f"./archive/changed/{file_id}.zip"
                shutil.copy(src,dst)
Hermann12
  • 1,709
  • 2
  • 5
  • 14