1

I am trying to convert my xml file (large file) to json format in php. The problem is the cdata is missing in json format.

This is my code,

<?php
$xml = simplexml_load_file('belovedskincare.xml', 'SimpleXMLElement', LIBXML_NOCDATA);
$jsondata = json_encode($xml);
echo $jsondata;
?>

XML code:

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<!--  This is a WordPress eXtended RSS file generated by WordPress as an export of your site.  -->
<!--  It contains information about your site's posts, pages, comments, categories, and other content.  -->
<!--  You may use this file to transfer that content from one site to another.  -->
<!--  This file is not intended to serve as a complete backup of your site.  -->
<!--  To import this information into a WordPress site follow these steps:  -->
<!--  1. Log in to that site as an administrator.  -->
<!--  2. Go to Tools: Import in the WordPress admin panel.  -->
<!--  3. Install the "WordPress" importer from the list.  -->
<!--  4. Activate & Run Importer.  -->
<!--  5. Upload this file using the form provided on that page.  -->
<!--  6. You will first be asked to map the authors in this export file to users  -->
<!--     on the site. For each author, you may choose to map to an  -->
<!--     existing user on the site or to create a new user.  -->
<!--  7. WordPress will then import each of the posts, pages, comments, categories, etc.  -->
<!--     contained in this file into your site.  -->
<!--  generator="WordPress/5.6.5" created="2021-10-02 09:17"  -->
<rss xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="http://wordpress.org/export/1.2/" version="2.0">
<channel>
<title>Beloved Skincare</title>
<link>https://belovedskincare.com.my</link>
<description>Food for your Skin</description>
<pubDate>Sat, 02 Oct 2021 09:17:04 +0000</pubDate>
<language>en-US</language>
<wp:wxr_version>1.2</wp:wxr_version>
<wp:base_site_url>https://belovedskincare.com.my</wp:base_site_url>
<wp:base_blog_url>https://belovedskincare.com.my</wp:base_blog_url>
<wp:author>
<wp:author_id>1</wp:author_id>
<wp:author_login>
<![CDATA[ admin ]]>
</wp:author_login>
<wp:author_email>
<![CDATA[ udaybtec@gmail.com ]]>
</wp:author_email>
<wp:author_display_name>
<![CDATA[ admin ]]>
</wp:author_display_name>
<wp:author_first_name>
<![CDATA[ Udaya ]]>
</wp:author_first_name>
<wp:author_last_name>
<![CDATA[ Kumar ]]>
</wp:author_last_name>
</wp:author>
<wp:author>
<wp:author_id>113</wp:author_id>
<wp:author_login>
<![CDATA[ studiogulf ]]>
</wp:author_login>
<wp:author_email>
<![CDATA[ uday@studiogulf.com ]]>
</wp:author_email>
<wp:author_display_name>
<![CDATA[ StudioGulf ]]>
</wp:author_display_name>
<wp:author_first_name>
<![CDATA[ Studio ]]>
</wp:author_first_name>
<wp:author_last_name>
<![CDATA[ Gulf ]]>
</wp:author_last_name>
</wp:author>
<generator>https://wordpress.org/?v=5.6.5</generator>
<image>
<url>https://belovedskincare.com.my/wp-content/uploads/2018/06/cropped-Logo-32x32.png</url>
<title>Beloved Skincare</title>
<link>https://belovedskincare.com.my</link>
<width>32</width>
<height>32</height>
</image>
<item>
<title>Coffee & Honey Face Scrub</title>
<link>https://belovedskincare.com.my/product/coffee-honey-face-scrub/</link>
<pubDate>Sun, 31 Dec 2017 17:01:43 +0000</pubDate>
<dc:creator>
<![CDATA[ admin ]]>
</dc:creator>
<guid isPermaLink="false">http://localhost/xstore/product/import-placeholder-for-496/</guid>
<description/>

output:

"channel": {
"title": "Beloved Skincare",
"link": "https://belovedskincare.com.my",
"description": "Food for your Skin",
"pubDate": "Sat, 02 Oct 2021 09:17:04 +0000",
"language": "en-US",
"generator": "https://wordpress.org/?v=5.6.5",
"image": {
"url": "https://belovedskincare.com.my/wp-content/uploads/2018/06/cropped-Logo-32x32.png",
"title": "Beloved Skincare",
"link": "https://belovedskincare.com.my",
"width": "32",
"height": "32"
},
"item": [
{
"title": "Coffee & Honey Face Scrub",
"link": "https://belovedskincare.com.my/product/coffee-honey-face-scrub/",

But the actual output should be,

"channel": {
"title": "Beloved Skincare",
"link": "https://belovedskincare.com.my",
"description": "Food for your Skin",
"pubDate": "Sat, 02 Oct 2021 09:17:04 +0000",
"language": "en-US",
"wp:wxr_version": "1.2",
"wp:base_site_url": "https://belovedskincare.com.my",
"wp:base_blog_url": "https://belovedskincare.com.my",
"wp:author": [
{
"wp:author_id": "1",
"wp:author_login": {
"#cdata": "admin"
},
"wp:author_email": {
"#cdata": "udaybtec@gmail.com"
},
"wp:author_display_name": {
"#cdata": "admin"
},
"wp:author_first_name": {
"#cdata": "Udaya"
},
"wp:author_last_name": {
"#cdata": "Kumar"
}
},
{
"wp:author_id": "113",
"wp:author_login": {
"#cdata": "studiogulf"
},
"wp:author_email": {
"#cdata": "uday@studiogulf.com"
},
"wp:author_display_name": {
"#cdata": "StudioGulf"
},
"wp:author_first_name": {
"#cdata": "Studio"
},
"wp:author_last_name": {
"#cdata": "Gulf"
}
}
],
"generator": "https://wordpress.org/?v=5.6.5",
"image": {
"url": "https://belovedskincare.com.my/wp-content/uploads/2018/06/cropped-Logo-32x32.png",
"title": "Beloved Skincare",
"link": "https://belovedskincare.com.my",
"width": "32",
"height": "32"
},
"item": [
{
"title": "Coffee & Honey Face Scrub",
"link": "https://belovedskincare.com.my/product/coffee-honey-face-scrub/",

As you can see, the "channel" attribute contains very little data. And With the actual output, you can see it contains more data. Since it's a large file, I have inserted a small part of the output. I hope this much output is enough. If You want more details, please mention them in a comment. I will insert it if it's possible.

Senthur Kumaran
  • 1,135
  • 1
  • 7
  • 18
  • Well you did specify `LIBXML_NOCDATA` in the options for loading the XML... – ADyson Oct 06 '21 at 08:20
  • This is the only format I found to convert the XML file to JSON with cdata. But it's not working. Is there any other way? @ADyson – Senthur Kumaran Oct 06 '21 at 08:33
  • not working how, specifically? Show the XML you're tying to convert, the output you got from the $jsondata, and the result you expected instead. As per the [manual](https://www.php.net/manual/en/libxml.constants.php), LIBXML_NOCDATA will merge the CData into text nodes. – ADyson Oct 06 '21 at 08:35
  • Is the file size have something to do with that? My XML file is large. @ADyson – Senthur Kumaran Oct 06 '21 at 08:39
  • Define "large", exactly. And unless you got an error or some unexpected output then probably not, no. But you'd need to turn on error reporting and/or debug it and/or show us the input data, the output you got and the output you expected, as I requested. I can't answer anything based on guesswork and vague descriptions. – ADyson Oct 06 '21 at 08:41
  • P.S. If it's a large file then reduce it down to a sample of the data. All we need is for Stackoverflow is a [mre] - enough code and data to demonstrate the problem. And as an extra bonus, if you find the problem goes away when you reduce the size of the data, then you'll know that might be a factor. – ADyson Oct 06 '21 at 08:43
  • I have edited the question. please look into it. @ADyson – Senthur Kumaran Oct 06 '21 at 09:02
  • You still need to provide the relevant parts of the source XML as well, as I requested. – ADyson Oct 06 '21 at 09:16
  • I have added the XML code as well. See if this is enough. @ADyson – Senthur Kumaran Oct 06 '21 at 09:36

0 Answers0