1

I am trying to generate xml file from pandas dataframe using pd.to_xml() method.

I have data in SQL-Server and trying to read from it.

id first_name last_name email gender ip_address
1 Erika Pickless epickless0@twitter.com female 133.146.172.220
2 Victor Stodd vstodd1@foxnews.com male 111.125.132.0
3 Becky Berrisford bberrisford2@shutterfly.com female 200.62.247.237
RawData = pd.read_sql_query('''select * from RFP.dbo.MOCK_DATA;''', conn)


RawData.to_xml('output.xml', attr_cols=['id', 'first_name', 'last_name', 'email', 
'gender', 'ip_address'],namespaces={"soap": "http://example1.com", "wsse": 
"http://docs.example2.org/wss/"}, prefix="soap")

I am getteing the following output

<?xml version='1.0' encoding='utf-8'?>
<soap:data xmlns:soap="http://example1.com" xmlns:wsse="http://docs.example2.org/wss/">
  <soap:row soap:index="0" soap:id="1" soap:first_name="Erika" soap:last_name="Pickless" soap:email="epickless0@twitter.com" soap:gender="Female" soap:ip_address="133.146.172.220"/>
  <soap:row soap:index="1" soap:id="2" soap:first_name="Victoir" soap:last_name="Stodd" soap:email="vstodd1@foxnews.com" soap:gender="Male" soap:ip_address="111.125.132.0"/>
  <soap:row soap:index="2" soap:id="3" soap:first_name="Becky" soap:last_name="Berrisford" soap:email="bberrisford2@shutterfly.com" soap:gender="Female" soap:ip_address="200.62.247.237"/>
</soap:data>

The above output is almost correct but, I want little different output: Issue: The prefix "soap" is attached to every attribute in the above output.

But I want following output:

<?xml version='1.0' encoding='utf-8'?>
<soap:data xmlns:soap="http://example1.com" xmlns:wsse="http://docs.example2.org/wss/">
  <soap:row index="0" id="1" first_name="Erika" last_name="Pickless" email="epickless0@twitter.com" gender="Female" ip_address="133.146.172.220"/>
  <soap:row index="1" id="2" first_name="Victoir" last_name="Stodd" email="vstodd1@foxnews.com" gender="Male" ip_address="111.125.132.0"/>
  <soap:row index="2" id="3" first_name="Becky" last_name="Berrisford" email="bberrisford2@shutterfly.com" gender="Female" ip_address="200.62.247.237"/>    
</soap:data>
dp808139
  • 122
  • 2
  • 12
  • In the To.XML you are missing the default namespace : namespaces = {"": "https://example.com"} where you have an empty string – jdweng Apr 14 '23 at 12:19
  • @jdweng if i add default namespace then it will give output like but it will not give expected output – dp808139 Apr 14 '23 at 13:01
  • @DhruvRajkotiya, can you share a minimal example of `RawData` ? – Timeless Apr 14 '23 at 13:02
  • An xml can have multiple namespaces and the element names you want prefix soap and attributes you want default. Right now everything is set to soap. – jdweng Apr 14 '23 at 13:12
  • @jdweng how to specify which element belongs to the soap namespace and which element belongs to the default namespace in the above Python code. – dp808139 Apr 14 '23 at 13:23
  • @Timeless I added example of RawData – dp808139 Apr 14 '23 at 13:23
  • When you create an element or attribute you also have to specify the name space. Otherwise you inherit the parent namespace. See : https://bytes.com/topic/python/answers/390119-xml-namespaces?force_isolation=true – jdweng Apr 14 '23 at 13:51

1 Answers1

2

Since XML is an open-ended design standard and Pandas cannot support all possible output specifications with default arguments, you need a customized solution.

Therefore consider running XSLT, the special-purpose language designed to transform XML files, which is supported in DataFrame.to_xml() using the default lxml parser and stylesheet argument. Below XSLT will run after your raw output to remove namespace prefixes from all attributes:

XSLT (save as .xsl script, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <!-- IDENTITY TRANSFORM -->
    <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
    </xsl:template>

    <!-- REMOVE NAMESPACE PREFIXES FROM ALL ATTRIBUTES -->
    <xsl:template match="@*">
      <xsl:attribute name="{local-name()}">
        <xsl:value-of select="."/>
      </xsl:attribute>
    </xsl:template>
</xsl:stylesheet>

Online Demo

Python

RawData.to_xml(
    "output.xml", 
    attr_cols=["id", "first_name", "last_name", "email", "gender", "ip_address"],
    namespaces={"soap": "http://example1.com", "wsse": "http://docs.example2.org/wss/"}, 
    prefix="soap",
    stylesheet="style.xsl"
)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • How did you design the style.xls file I want more customization to my XML file. I want to learn about it. – dp808139 Apr 16 '23 at 08:27
  • Read up on XSLT, (https://stackoverflow.com/tags/xslt/info), a special purpose language used in the industry beyond Python! Keep in mind pandas via lxml only supports XSLT 1.0, not 2.0 or latest 3.0. – Parfait Apr 16 '23 at 21:51