0

briefly: I'm doing a project, where I need to separate XML files in different folders by a value found inside its tag.

The specific case: I'm new to using Python. This specific script has in mind to look for a tag in an XML file (the file in question is the "NF-E 4.00" Brazilian electronic invoice), and this tag informs the type of payment. I want to separate the files contained in a folder and divide them into two folders: cash payments and the rest.I will send the link to the documentation about the tag:

Google.

This documentation is in Portuguese; however, when using the Google translator provided by the Google Chrome browser, I saw that it is easy to understand in English.

What I really want is for the code to start identifying the value inside the tag and stop passing all the files to the card folder.

my code:

"""
This script reads XML files in a folder and moves the files to different
folders based on the value of the <tPag> tag.
Files with the value 01 in <tPag> are moved to the 'money' folder
and files with other values are moved to the 'cards' folder.
The path of the input and output folders can be modified by changing the
constants INPUT_FOLDER, OUTPUT_FOLDER_TPAG_1, and OUTPUT_FOLDER_TPAG_NOT_1.
"""
import os
import shutil
import xml.etree.ElementTree as ET

# folder where the input XML files are located
INPUT_FOLDER = r"C:\Exemple\input\folder"

# folder where files with value 01 in the <tPag> tag will be saved
OUTPUT_FOLDER_TPAG_1 = r"C:\Exemple\output\money\folder"

# folder where files with values other than 01 in the <tPag> tag will be saved
OUTPUT_FOLDER_TPAG_NOT_1 = r"C:\Exemple\output\card\folder"

# Iterate through all files in the input folder
for filename in os.listdir(INPUT_FOLDER):
    # Check if the file is an XML file
    if filename.endswith(".xml"):
        # Create the full path of the file
        xml_file = os.path.join(INPUT_FOLDER, filename)
        try:
            # Parse the XML file
            tree = ET.parse(xml_file)
            root = tree.getroot()
            # Check if there is a <tPag> tag with a value equal to 01
            TPAG_1_FOUND = False
            for child in root.findall("TNFe/infNFe/pag/detPag/tPag"):
                # Get the value of the tag
                tpag_value = child.text
                # Check if the value of the tag is 1
                if tpag_value == "01":
                    TPAG_1_FOUND = True
                    break
            # Move the file to the corresponding folder
            if TPAG_1_FOUND:
                output_file = os.path.join(OUTPUT_FOLDER_TPAG_1, filename)
                shutil.move(xml_file, output_file)
            else:
                output_file = os.path.join(OUTPUT_FOLDER_TPAG_NOT_1, filename)
                shutil.move(xml_file, output_file)
        except ET.ParseError as e:
            # If there is an error while parsing the file, display an error message
            print(f"Error parsing file {xml_file}")
# Display a completion message when the task is finished
print("Task completed!")

The terminal just show: Task Completed! And then it passes all the files to the card folder, no files go to the money folder.

I tried to change the path of the tag. I created an exception that did not find the tPAg value in any invoice (despite being present in all). When I remove the exception and the error message, it simply marks the constant TPAG_1_FOUND as else and throws the XML file to the card folder.

This is a example of a XML file that i'm working with:

    <?xml version="1.0"?>
<nfeProc xmlns="http://www.portalfiscal.inf.br/nfe" versao="4.00">
    <NFe xmlns="http://www.portalfiscal.inf.br/nfe">
        <infNFe versao="4.00" Id="example">
            <ide>
                <cUF>example</cUF>
                <cNF>example</cNF>
                <natOp>example</natOp>
                <mod>example</mod>
                <serie>example</serie>
                <nNF>example</nNF>
                <dhEmi>example</dhEmi>
                <tpNF>example</tpNF>
                <idDest>example</idDest>
                <cMunFG>example</cMunFG>
                <tpImp>example</tpImp>
                <tpEmis>example</tpEmis>
                <cDV>example</cDV>
                <tpAmb>example</tpAmb>
                <finNFe>example</finNFe>
                <indFinal>example</indFinal>
                <indPres>example</indPres>
                <procEmi>example</procEmi>
                <verProc>example</verProc>
            </ide>
            <emit>
                <CNPJ>example</CNPJ>
                <xNome>example</xNome>
                <xFant>example</xFant>
                <enderEmit>
                    <xLgr>example</xLgr>
                    <nro>example</nro>
                    <xCpl>example</xCpl>
                    <xBairro>example</xBairro>
                    <cMun>example</cMun>
                    <xMun>example</xMun>
                    <UF>example</UF>
                    <CEP>example</CEP>
                    <cPais>example</cPais>
                    <xPais>example</xPais>
                    <fone>example</fone>
                </enderEmit>
                <IE>example</IE>
                <CRT>example</CRT>
            </emit>
            <det nItem="1">
                <prod>
                    <cProd>example</cProd>
                    <cEAN>example</cEAN>
                    <xProd>example</xProd>
                    <NCM>example</NCM>
                    <CFOP>example</CFOP>
                    <uCom>example</uCom>
                    <qCom>0example</qCom>
                    <vUnCom>example</vUnCom>
                    <vProd>example</vProd>
                    <cEANTrib>example</cEANTrib>
                    <uTrib>example</uTrib>
                    <qTrib>example</qTrib>
                    <vUnTrib>example</vUnTrib>
                    <indTot>example</indTot>
                </prod>
                <imposto>
                    <ICMS>
                        <ICMSSN102>
                            <orig></orig>
                            <CSOSN>example</CSOSN>
                        </ICMSSN102>
                    </ICMS>
                    <PIS>
                        <PISAliq>
                            <CST></CST>
                            <vBC>example</vBC>
                            <pPIS>example</pPIS>
                            <vPIS>example</vPIS>
                        </PISAliq>
                    </PIS>
                    <COFINS>
                        <COFINSAliq>
                            <CST>example</CST>
                            <vBC>example</vBC>
                            <pCOFINS>example</pCOFINS>
                            <vCOFINS>example</vCOFINS>
                        </COFINSAliq>
                    </COFINS>
                </imposto>
            </det>
            <total>
                <ICMSTot>
                    <vBC>example</vBC>
                    <vICMS>example</vICMS>
                    <vICMSDeson>example</vICMSDeson>
                    <vFCPUFDest>example</vFCPUFDest>
                    <vICMSUFDest>example</vICMSUFDest>
                    <vICMSUFRemet>example</vICMSUFRemet>
                    <vFCP>example</vFCP>
                    <vBCST>example</vBCST>
                    <vST>example</vST>
                    <vFCPST>example</vFCPST>
                    <vFCPSTRet>example</vFCPSTRet>
                    <vProd>5.80</vProd>
                    <vFrete>example</vFrete>
                    <vSeg>example</vSeg>
                    <vDesc>example</vDesc>
                    <vII>example</vII>
                    <vIPI>example</vIPI>
                    <vIPIDevol>example</vIPIDevol>
                    <vPIS>example</vPIS>
                    <vCOFINS>example</vCOFINS>
                    <vOutro>example</vOutro>
                    <vNF>5.80</vNF>
                </ICMSTot>
            </total>
            <transp>
                <modFrete>example</modFrete>
            </transp>
            <pag>
                <detPag>
                    <indPag>0</indPag>
                    <tPag>01</tPag>
                    <vPag>5.80</vPag>
                </detPag>
            </pag>
            <infAdic>
                <infCpl>example</infCpl>
            </infAdic>
            <infRespTec>
                <CNPJ>example</CNPJ>
                <xContato>example</xContato>
                <email>example</email>
                <fone>example</fone>
            </infRespTec>
        </infNFe>
        <infNFeSupl>
            <qrCode>example</qrCode>
            <urlChave>example</urlChave>
        </infNFeSupl>
        <Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
            <SignedInfo>
                <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
                <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/>
                <Reference URI="example">
                    <Transforms>
                        <Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/>
                        <Transform Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
                    </Transforms>
                    <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                    <DigestValue>example</DigestValue>
                </Reference>
            </SignedInfo>
            <SignatureValue>example</SignatureValue>
            <KeyInfo>
                <X509Data>
                    <X509Certificate>example</X509Certificate>
                </X509Data>
            </KeyInfo>
        </Signature>
    </NFe>
    <protNFe versao="4.00">
        <infProt Id="NFe00">
            <tpAmb>example</tpAmb>
            <verAplic>example</verAplic>
            <chNFe>example</chNFe>
            <dhRecbto>example</dhRecbto>
            <nProt>example</nProt>
            <digVal>example</digVal>
            <cStat>example</cStat>
            <xMotivo>example</xMotivo>
        </infProt>
    </protNFe>
</nfeProc>
  • Make sure the element is found, test this outside the code `root.findall("TNFe/infNFe/pag/detPag/tPag")`. ElementTree uses relative xpaths I believe so it might be `root.findall("./TNFe/infNFe/pag/detPag/tPag")` or `root.findall(".//TNFe/infNFe/pag/detPag/tPag")` – LMC Feb 27 '23 at 18:28
  • Thanks for the tip, I wasn't able to find the desired element, I'll keep trying to find the correct path to the tag. I tried this with one of the sames XML that I'am using: `import xml.etree.ElementTree as ET` `# create the XML root object` `tree = ET.parse('test.xml') root = tree.getroot()` `# find all occurrences of the 'tPag' tag` `tpags = root.findall(".//TNFe/infNFe/pag/detPag/tPag")` `# check if any tags were found` `if len(tpags) > 0: print(f"{len(tpags)} tags were found.") ` `else: print("No tags were found.")` – thiago gentil Feb 27 '23 at 19:13
  • and always receive `No tags were found.` in the terminal – thiago gentil Feb 27 '23 at 19:16
  • what are the root elements of the XML? Are there any namespaces? Better post a minimal example of the file. – LMC Feb 27 '23 at 19:41
  • I added just now the xml file, I think the right way is : `tpags = root.findall("nfeProc/NFe/infNFe/pag/tPag")` but still not working with a original NF-e XML, I made one with just 6 elements to test out and worked great. – thiago gentil Feb 27 '23 at 20:54
  • Handling namespaces in python https://stackoverflow.com/a/36778034/2834978 – LMC Feb 27 '23 at 21:17

2 Answers2

0

As suggested by @LMC in the comment, you need to specify namespace when searching for the tPag element. For details about namespace, check the documentation.

A possible solution to your script is shown below, with change to just one line. Notice that I also use a XPath syntax to simplify the search for all tPag element. However, this syntax might NOT work if there are multiple tPag elements in the file and you have to rely on a specific one.

"""
This script reads XML files in a folder and moves the files to different
folders based on the value of the <tPag> tag.
Files with the value 01 in <tPag> are moved to the 'money' folder
and files with other values are moved to the 'cards' folder.
The path of the input and output folders can be modified by changing the
constants INPUT_FOLDER, OUTPUT_FOLDER_TPAG_1, and OUTPUT_FOLDER_TPAG_NOT_1.
"""
import os
import shutil
import xml.etree.ElementTree as ET

# folder where the input XML files are located
INPUT_FOLDER = r"."

# folder where files with value 01 in the <tPag> tag will be saved
OUTPUT_FOLDER_TPAG_1 = r"./pages"

# folder where files with values other than 01 in the <tPag> tag will be saved
OUTPUT_FOLDER_TPAG_NOT_1 = r"."

# Iterate through all files in the input folder
for filename in os.listdir(INPUT_FOLDER):
    # Check if the file is an XML file
    if filename.endswith(".xml"):
        # Create the full path of the file
        xml_file = os.path.join(INPUT_FOLDER, filename)
        try:
            # Parse the XML file
            tree = ET.parse(xml_file)
            root = tree.getroot()
            # Check if there is a <tPag> tag with a value equal to 01
            TPAG_1_FOUND = False
            for child in root.findall(".//{http://www.portalfiscal.inf.br/nfe}tPag"):  # <-- add namespace and use XPath syntax
                # Get the value of the tag
                tpag_value = child.text
                # Check if the value of the tag is 1
                if tpag_value == "01":
                    TPAG_1_FOUND = True
                    break
            # Move the file to the corresponding folder
            if TPAG_1_FOUND:
                output_file = os.path.join(OUTPUT_FOLDER_TPAG_1, filename)
                shutil.move(xml_file, output_file)
            else:
                output_file = os.path.join(OUTPUT_FOLDER_TPAG_NOT_1, filename)
                shutil.move(xml_file, output_file)
        except ET.ParseError as e:
            # If there is an error while parsing the file, display an error message
            print(f"Error parsing file {xml_file}")
# Display a completion message when the task is finished
print("Task completed!")
Fanchen Bao
  • 3,310
  • 1
  • 21
  • 34
0

I needed to rewrite the code after reading the documentation and specifying the namescapeURI and I managed to solve the problem.

"""
Script that organizes XML files based on the payment method used.

Files are moved to specific folders depending on the payment method specified in the XML file.

"""

import shutil
import os
import xml.etree.ElementTree as ET

# Specifies the directory where the XML files are located
DIRECTORY = r"C:\Users\example\input\folder"

# Loop through the files in the directory
for filename in os.listdir(DIRECTORY):
   if filename.endswith('.xml'):  # Check if it is an XML file
       # Get the full path of the file
       file_path = os.path.join(DIRECTORY, filename)

       # Read the XML file
       tree = ET.parse(file_path)
       root = tree.getroot()

       # Get the name of the file
       file_name = 'NFe' + \
           root.find(
               ".//{http://www.portalfiscal.inf.br/nfe}nNF").text + '.xml'

       # Check the value of the tPag element to determine the destination folder
       if root.find(".//{http://www.portalfiscal.inf.br/nfe}tPag").text == '01':
           DESTINATION = r"C:\Users\example\folder\money"
       else:
           DESTINATION = r"C:\Users\example\folder\outher"

       # Move the file to the correct destination folder
       shutil.move(file_path, os.path.join(DESTINATION, file_name))