1

I've used SO for many years and always found an answer but this time I have got myself well and truly lost.

I have an xml file I would like to split the compatbility into well formed xml

`<product>
<item>
<partno>abc123</partno>
<Compatbility>model1: 110C, 115C, 117C. model2: 1835C, 1840C. model3: 210C, 215C, 3240C.</Compatbility>
</item>
</product>`

In Compatbility the word model changes with each item entry although the : after model is always there as is the . after each model group.

Should I use SimpleXml DomXml or an xpath to get the following result

`<product>
<item>
<partno>abc123</partno>
<Compatbility>
<model>model1: 110C, 115C, 117C.</model>
<model>model2: 1835C, 1840C.</model> 
<model>model3: 210C, 215C, 3240C.</model>
</Compatbility>
</item>
</product>`

Thanks

2 Answers2

2

For simplexml, you can run a regular expression matching on the text-value of an element.

You can then remove all inner text and add the parsed result as new child elements.

This can be done with all you said: DOMDocument, SimpleXMLElement - both with or without xpath.

Here is a commented example in SimpleXML (online demo):

<?php
/**
 * @link http://stackoverflow.com/q/24304095/367456
 * @link https://eval.in/164934
 */
$buffer = <<<XML
<product>
<item>
<partno>abc123</partno>
<Compatbility>model1: 110C, 115C, 117C. model2: 1835C, 1840C. model3: 210C, 215C, 3240C.</Compatbility>
</item>
</product>
XML;

# load the xml string
$xml = simplexml_load_string($buffer);

# obtain the element in question
$compatbility = $xml->item->Compatbility;

# parse it's inner text-value for the models by a regex
$pattern = '~(model\\d?: [^.]+\\.) ?~u';
$result  = preg_match_all($pattern, $compatbility, $matches);

# remove the text (so called simplexml self-reference)
$compatbility->{0} = '';

# add the parsed models as new model elements
foreach ($matches[1] as $model) {
    $compatbility->model[] = $model;
}

# output the xml
$xml->asXML('php://output');

The output it gives is:

<?xml version="1.0"?>
<product>
<item>
<partno>abc123</partno>
<Compatbility><model>model1: 110C, 115C, 117C.</model><model>model2: 1835C, 1840C.</model><model>model3: 210C, 215C, 3240C.</model></Compatbility>
</item>
</product>
hakre
  • 193,403
  • 52
  • 435
  • 836
  • thank you for taking the time explaining the intricacies of what the OP needs to know. a big shortcoming from my answer, my answer just spawned another question because the OP didn't know whats going on under the hood. plus, this answer is way more better +1 – user1978142 Jun 20 '14 at 00:34
0

First ofcourse, you need to convert that first into something that you can manipulate (arrays). Then the usual parsing (using explode). In the end, you will need to create a new xml again. Consider this example:

$xml_string = '<product><item><partno>abc123</partno><Compatbility>model1: 110C, 115C, 117C. model2: 1835C, 1840C. model3: 210C, 215C, 3240C.</Compatbility></item></product>';
$original_xml = simplexml_load_string($xml_string);
$data = json_decode(json_encode($original_xml), true);
$compatbility = $data['item']['Compatbility']; // get all compatibility values
// explode values
$compatbility = array_filter(array_map('trim', explode('.', $compatbility)));

$new_xml = new SimpleXMLElement('<product/>'); // initialize new xml
// add necessary values
$new_xml->addChild('item')->addChild('partno', $data['item']['partno']);
$new_xml->item->addChild('Compatbility');
// loop the values and add them as children
foreach($compatbility as $value) {
    $value = trim(preg_replace('/(\w+):/', '', $value));
    $new_xml->item->Compatbility->addChild('model', $value);
}
echo $new_xml->asXML(); // output as xml
user1978142
  • 7,946
  • 3
  • 17
  • 20