0

I wouldlike to remove some character when I try to convert my xml to dict :

data = xmltodict.parse(open('test.xml').read())

    with open('test2.json', "wt", encoding='utf-8', errors='ignore') as f:
        json.dump(data, f, indent=4, sort_keys=True)
        return data

The problem actually i have many json file some json file like this :

{
        "pcrs:test A": {
            "pcrs:nature": "03", 
            "pcrs:producteur": "SIEML"
}}

And some json file like this(without pcrs) :

{
        "test B": {
            "nature": "03", 
            "producteur": "SIEML",
}}

How to force any file like the first example to be without 'pcrs:' as the seconde example.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
martin
  • 143
  • 1
  • 3
  • 9

1 Answers1

2

That is a namespace prefix. Because you don't include sample XML, I've made up one of my own.

<?xml version="1.0" encoding="UTF-8"?>
<root_elem xmlns:pcrs="http://the/pcrs/url">
<pcrs:subelem/>
</root_elem>

xmltodict lets you manage namespaces by mapping the namespace url to a different representation. Most notably, None removes it completely. See Namespace Support.

In your case, you can do

data = xmltodict.parse(open('test.xml').read(),
    process_namespaces=True,
    namespaces={"http://the/pcrs/url":None})

substituting the real namespace URL for http://the/pcrs/url.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • After posting, I got to thinking that you may be dealing with namespace qualified attributes and I'm not sure if `xmltodict` handles them. This post suggests its a problem https://stackoverflow.com/questions/26726728/remove-namespace-with-xmltodict-in-python – tdelaney Jun 13 '20 at 19:51