0

I saw lots of tutorials here in overflow, but I could not understand what I am missing.. So I need some help..

I have an XML which it is online and I am trying to parse it like this:

<products>
    <product>
    <id>13389</id>
    <name><![CDATA[ product name ]]></name>
    <category id="14"><![CDATA[ Shoes > test1 ]]></category>
    <price>41.30</price>
</products>

As far, I am reading the XML and parsing it like this:

$reader = new XMLReader();
$reader->open($product_xml_link);
while($reader->read()) {
if($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'product' ) {
    $product = new SimpleXMLElement($reader->readOuterXml());
    $pid = $product->id;
    $name = $product->name;
    $name = strtolower($name);
    $link = $product->link;
    $price = $product->Price;
    ...
    ...
}
} //end while loop

As you can see, there is an id in category tag.. This is the one I would like to grab and procceed to my code..

I did something like this:

echo "prodcut= " . (string)$product->category->getAttribute('id');

The error I am getting is: Call to undefined method SimpleXMLElement::getAttribute()

I need this id in order to test it before insert it in DB.. So,

if($id = 600) {
//insert DB
}
mzjn
  • 48,958
  • 13
  • 128
  • 248

2 Answers2

1

Here are several things. First $product = new SimpleXMLElement($reader->readOuterXml()); means that you're reading all that as an separate XML document and parse again. Here is expand(), that will return directly an DOM node and DOM nodes can be imported into SimpleXML.

For attributes use array syntax..

$reader = new XMLReader();
$reader->open($product_xml_link);

// an document to expand to
$document = new DOMDocument();

// find the first product node
while ($reader->read() && $reader->localName !== 'product') {
  continue;
}

while ($reader->localName === 'product') {
  $product = simplexml_import_dom($reader->expand($document));
  $data = [
    'id' => (string)$product->id,
    'name' => (string)$product->name,
    'category_id' => (string)$product->category['id'],
    // ...
  ];
  var_dump($data);
  // move to the next product sibling
  $reader->next('product');
}
$reader->close();

Output:

array(3) {
  ["id"]=>
  string(5) "13389"
  ["name"]=>
  string(14) " product name "
  ["category_id"]=>
  string(2) "14"
}

Of course you can use the DOM directly and fetch the detail data using Xpath expressions:

$reader = new XMLReader();
$reader->open($product_xml_link);

// prepare a document to expand to
$document = new DOMDocument();
// and an xpath instance to use
$xpath = new DOMXpath($document);

// find the first product node
while ($reader->read() && $reader->localName !== 'product') {
  continue;
}

while ($reader->localName === 'product') {
  $product = $reader->expand($document);
  $data = [
    'id' => $xpath->evaluate('string(id)', $product),
    'name' => $xpath->evaluate('string(name)', $product),
    'category_id' => $xpath->evaluate('string(category/@id)', $product),
    // ...
  ];
  var_dump($data);
  // move to the next product sibling
  $reader->next('product');
}
$reader->close();
ThW
  • 19,120
  • 3
  • 22
  • 44
  • Hello, thanks for your answer.. Could I ask if there is any easier way, without any arrays? My code as is, is that possible to grab the id? without $document = new DOMDocument(); // and an xpath instance to use $xpath = new DOMXpath($document); or anything? – Kiriakos Grhgoriadhs Jul 19 '17 at 14:09
  • The array is only a way to collect the data that is read. Use variables, call functions, ... and well, you can use my FluentDOM library. It extends XMLReader/DOM and abstracts some of that away: https://github.com/FluentDOM/FluentDOM/blob/master/examples/XMLReader/sitemap.php :-) – ThW Jul 19 '17 at 14:18
  • I keep my solution as is, with some changes you told about: $document = ... $xpath = . . . I have erased the code line: $product = new SimpleXMLElement($reader->readOuterXml()); and turned all my data to array as you mention.. Now, I see that the parsing XML is a little bit slower...The XML has 5.500 products (which are not very much). Before any changes, the XML is a bit faster I believe.. Any suggestions? – Kiriakos Grhgoriadhs Jul 20 '17 at 11:39
  • Hello again, I finally got the attribute id of every category.. Now, there is another problem.. I have already write another question https://stackoverflow.com/questions/45213764/check-if-xml-element-tag-empty but still I will write it here: how can I check if an element tag is empty or not, when there is CDATA inside its elemet? – Kiriakos Grhgoriadhs Jul 21 '17 at 09:40
0

you want to loop all the products, and extract the child elements id,name,link, and price 's text content? that can be done like:

foreach((@DOMDocument::loadHTML($xml))->getElementsByTagName("product") as $product){
    $vars=array('id','name','link','price');
    foreach($vars as $v){
        ${$v}=$product->getElementsByTagName($v)->item(0)->textContent;
    }
    unset($v,$vars);
    //now you have $id , $name , $link , $price as raw text, and $product is the DOMNode for the <product> tag.
}

and if you only want to process id 600, add if($id!=600){continue;} after the unset(); - and if you want to save some CPU, you should also insert a break; at the end of the foreach loop in that case. (then it will stop looping once it found id 600)

Edit: fixed a code breaking typo, the code won't work without the typo fix

edit: if you want to use XPath to find the correct element, it'd be $product=(new DOMXpath((@DOMDOcument::loadHTML($xml))))->query('//product/id[text()=\'600\']')->item(0)->parentNode;

edit: fixed another code-breaking typo (items(0) -> item(0) )

hanshenrik
  • 19,904
  • 4
  • 43
  • 89