1

I've made a simple tool that lets you fill in an input field with a URL for an XML file. It's supposed to show all the nodes so the user can match them with database fields, which I have working for an XML file that has 2 "primary" nodes. Example of the XML file:

<foods>
    <food>
        <name>ravioli</name>
        <recipe>food.com/ravioli</recipe>
        <time>10 minutes</time>
    </food>
    <food>
        <name>ravioli</name>
        <recipe>food.com/ravioli</recipe>
        <time>10 minutes</time>
    </food>
</foods>

This returns me a list that says

name recipe time

The problem is when someone wants to use an XML file that doesn't have 2 "primary" nodes. For example it's missing the <food> node. In this case it wouldn't be able to show the result because my PHP code is expecting 2 instead of 1 primary.

My code is as follows:

// Fetch the XML from the URL
if (!$xml = simplexml_load_file($_GET['url'])) {
    // The XML file could not be reached
    echo 'Error loading XML. Please check the URL.';
} else {
    // Parse through the XML and fetch the nodes
    $child = $xml->children();
    foreach($child->children() as $key => $value) {
        echo $key."<br>";
    }
}

Is there a way to get the nodes I want from any XML file, regardless of the amount of parent nodes?

user1433479
  • 135
  • 1
  • 5
  • 20
  • 1
    The wonderful XML2Array function on php docs might help you with reliably getting an array out of xml string. Once you have an array, your iteration should be easy. http://codepad.viper-7.com/1UlVGV – Prasanth Mar 10 '14 at 11:59
  • 1
    you can use DOMDocument, and can search the XML tags using getElementsByTagName method http://www.php.net/manual/en/domdocument.getelementsbytagname.php – PravinS Mar 10 '14 at 12:00
  • @PravinS The problem is that I don't know the tag name. – user1433479 Mar 10 '14 at 12:02
  • ok, so you can use the XML2Array function suggested my Prasanth – PravinS Mar 10 '14 at 12:03
  • I found out that I only have to read XML feeds from a few sources, which means I can find out the XML structure, make the user select their affiliate and use a switch case to change between parsing methods. – user1433479 Mar 10 '14 at 13:43
  • I got XML2Array working for the example, but the XML I have to read is fetched through simplexml_load_file, which returns it as a SimpleXMLElement Object. Should I use a different method to fetch the XML? – user1433479 Mar 10 '14 at 15:39

1 Answers1

2

You can query data from an XML DOM, using Xpath. It is accessible in PHP using the DOMXpath::evaluate() method. The second argument is the context, so you're expressions can be relative to another node. Converting it to an list of records (for database, csv, ...). will require several steps. Starting with some bootstrap:

$xml = <<<'XML'
<foods>
    <food>
        <name>ravioli 1</name>
        <recipe>food.com/ravioli-1</recipe>
        <time unit="minutes">10</time>
    </food>
    <food>
        <name>ravioli 2</name>
        <recipe>food.com/ravioli-2</recipe>
        <time unit="minutes">11</time>
    </food>
</foods>
XML;

$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);

First we need to define which xml element defines the record, then which elements define the fields.

So let's build a lists of possible record paths and field paths:

$paths = [];
$leafs = [];
foreach ($xpath->evaluate('//*|//@*') as $node) {
  $isPath = $xpath->evaluate('count(@*|*) > 0', $node);
  $isLeaf = !($xpath->evaluate('count(*) > 0', $node));
  $path = '';
  foreach ($xpath->evaluate('ancestor::*', $node) as $parent) {
    $path .= '/'.$parent->nodeName;
  }
  $path .= '/'.($node instanceOf DOMAttr ? '@' : '').$node->nodeName;
  if ($isLeaf) {
    $leafs[$path] = TRUE;
  }
  if ($isPath) {
    $paths[$path] = TRUE;
  }
}
$paths = array_keys($paths);
$leafs = array_keys($leafs);
var_dump($paths, $leafs);

Output:

array(3) {
  [0] =>
  string(6) "/foods"
  [1] =>
  string(11) "/foods/food"
  [2] =>
  string(16) "/foods/food/time"
}
array(4) {
  [0] =>
  string(16) "/foods/food/name"
  [1] =>
  string(18) "/foods/food/recipe"
  [2] =>
  string(16) "/foods/food/time"
  [3] =>
  string(22) "/foods/food/time/@unit"
}

Next show the possible record paths to the user. The user needs to select one. Knowing the record path, build a list of the possible field paths from the leafs array:

$path = '/foods/food';

$fieldLeafs = [];
$pathLength = strlen($path) + 1;
foreach ($leafs as $leaf) {
  if (0 === strpos($leaf, $path.'/')) {
    $fieldLeafs[] = substr($leaf, $pathLength);
  }
}
var_dump($fieldLeafs);

Output:

array(4) {
  [0] =>
  string(4) "name"
  [1] =>
  string(6) "recipe"
  [2] =>
  string(4) "time"
  [3] =>
  string(10) "time/@unit"
}

Put up some dialog that allows the user to select a path for each field.

$fieldDefinition = [
  'title' => 'name',
  'url' => 'recipe',
  'needed_time' => 'time',
  'time_unit' => 'time/@unit'
];

Now use the path and the mapping to build up the records array:

$result = [];
foreach ($xpath->evaluate($path) as $node) {
  $record = [];
  foreach ($fieldDefinition as $field => $expression) {
    $record[$field] = $xpath->evaluate(
      'string('.$expression.')',
      $node
    );
  }
  $result[] = $record;
}
var_dump($result);

Output:

array(2) {
  [0] =>
  array(4) {
    'title' =>
    string(9) "ravioli 1"
    'url' =>
    string(18) "food.com/ravioli-1"
    'needed_time' =>
    string(2) "10"
    'time_unit' =>
    string(7) "minutes"
  }
  [1] =>
  array(4) {
    'title' =>
    string(9) "ravioli 2"
    'url' =>
    string(18) "food.com/ravioli-2"
    'needed_time' =>
    string(2) "11"
    'time_unit' =>
    string(7) "minutes"
  }
}

The full example can be found at: https://eval.in/118012

The XML in the example is never converted to a generic array. Doing this would mean to loosing information and double storage. So don't. Extract structure information from the XML, let the user define the mapping. Use Xpath extract the data and store them directly in the result format.

ThW
  • 19,120
  • 3
  • 22
  • 44
  • But what if I don't know what's in the XML file? What do I use instead of '//recipe'. – user1433479 Mar 10 '14 at 15:17
  • You need at least some logic to identify the content, like if a text node starts with "food.com/". Xpath is a powerful tool to match a defined structure, but you have to define that structure. You can not read html as atom/rss for example, but you can define rules to map parts of an html file to atom. Please describe in more detail what you would like to do. – ThW Mar 10 '14 at 21:35
  • I'm need to fill in the URL for an XML file and then get a list of all the fields in that XML file, regardless of the XML file's structure. Right now I'm running a json_encode on the XML, then a json_decode to get it as an array. – user1433479 Mar 11 '14 at 07:30
  • And then? What are you going to do with that array? It has less information then the XML had. – ThW Mar 11 '14 at 07:35
  • I just need to get a list of all the nodes from the XML, like in the example I would want to have a list that says "name, recipe, time". – user1433479 Mar 11 '14 at 07:38
  • To be more clear, I wanna upload the content of those fields to a database, after the user has matched the nodes to the database fields. This is essentially what I want to make, but it has to work for different XML makeups. – user1433479 Mar 11 '14 at 07:41
  • I answered something a long these line yesterday. It might help you, too: http://stackoverflow.com/a/22312281/2265374 – ThW Mar 11 '14 at 08:04
  • This seems to work, but can I still access the values in these keys? And one of the XML files has a node like . Would it be possible to get that one aswell? – user1433479 Mar 11 '14 at 08:17
  • Of course, it is just logic, but i would not in the first step. More along the following lines: 1 provide the leafs to the user. The user selects on path as the `list` element. All nodes that start with that path are provided to the user to assign a field for it. Now you can use the first path to iterate the xml and the field paths to read a value for each field. The result would be an array of records. – ThW Mar 11 '14 at 08:31
  • How would I get the different name values (name="campaignID") instead of only @name? Right now it only gets the name of the property within the node, but I need the value of the property. – user1433479 Mar 11 '14 at 08:37
  • I rewrote my answer to show an example for the structure read/mapping definition concept from my last comment. – ThW Mar 11 '14 at 09:33
  • This doesn't seem to work for my XML files. It only works if I use the one in the example. Right now the user can't select a path because it's not showing any. How would I get a pathlist? – user1433479 Mar 11 '14 at 09:40
  • I don't now you XML files. It works fine with all kind of wellformed XML I throw at it. Namespaces would need additional logic however (not to show the paths, but to use them). – ThW Mar 11 '14 at 10:11
  • I've got it working now up untill the `$fieldDefinition` part. I don't quite understand this. Does the user need to select the path seperately for every field? How would I store this information? – user1433479 Mar 11 '14 at 13:54
  • Yes, you said it yourself: "..., after the user has matched the nodes to the database fields...". Creating the $fieldDefinitions array is that step. You can start with a dialog with dropdowns for each field. Let the user select the leaf expression for each database field. Storing that definition is possible. In the $_SESSION, serialized, as XML or as JSON. It depends one your application. – ThW Mar 11 '14 at 17:02