0

I'm parsing a 500MB XML file for ebay's full range of "GetCategorySpecifics" and storing in the local database.

The node "MinValues" is inserting incorrectly for some rows, but not all. If this node doesn't exist in the structure, then the value should be 0. I've placed checks in the code to ensure if its missing then set to 0, but the problem remains.

XML structure:

<Recommendations>
    <CategoryID>80</CategoryID>
        <NameRecommendation>
             <Name>Size</Name>
             <ValidationRules>
                     <MaxValues>1</MaxValues>
                     <MinValues>1</MinValues>
                     <SelectionMode>FreeText</SelectionMode>
             </ValidationRules>
             <ValueRecommendation>
                 <Value>Large</Value>
             </ValueRecommendation>
        </NameRecommendation>
</Recommendations>

PHP:

// Define XMLreader
    $xml = new XMLReader;
    $xml->open($xml_file,"UTF-8",LIBXML_ERR_ERROR);
    $xml->read();

// Loop through file
    while ($xml->read()) {

        // check this isn't an ending node
        if ($xml->nodeType != XMLReader::END_ELEMENT) {

                // Its a new category - reset variables and define new ID 
                if ($xml->name == 'CategoryID') {

                    $MinValues = 0; $MaxValues = 1; $SelectionMode = '';
                    $xml->read();

                    // Validate new category ID - if invalid move to next 
                    if ($xml->name == '#text' && $xml->hasValue) {
                       $CategoryID = trim($xml->value);  
                       if (!is_numeric($CategoryID) || empty($CategoryID) || $CategoryID < 1) {
                          $xml->next('Recommendations');
                       }             
                    }         
                    else {
                        $xml->next('Recommendations');
                    }   

                }


                // It's the Name tag - define Name variable 
                if ($xml->name == 'Name') {

                    $xml->read();
                    if ($xml->name == '#text' && $xml->hasValue) {
                        $Name = mysql_real_escape_string($xml->value);       
                    }        

                }

                // It's the MaxValues tag - define MaxValues variable 
                if ($xml->name == 'MaxValues') {

                    $xml->read();
                    if ($xml->name == '#text' && $xml->hasValue) {
                        $MaxValues = mysql_real_escape_string($xml->value); 
                        if (!is_numeric($MaxValues) || empty($MaxnValues) || $MaxValues < 1) {    
                            $MaxValues = 1; 
                         }
                     }     

                 }

        // It's the MinValues tag - define MinValues variable 
        if ($xml->name == 'MinValues') {
            $xml->read();
            if ($xml->name == '#text' && $xml->hasValue) {
                $MinValues = mysql_real_escape_string($xml->value); 
                if (!is_numeric($MinValues) || empty($MinValues) || $MinValues < 1) {    
                    $MinValues = 0; 
                }
            }                        
        }


        // It's the SelectionMode tag - Insert new entry row into DB
        if ($xml->name == 'SelectionMode') {
            $xml->read();
            if (($xml->name == '#text') && $xml->hasValue) {
                $SelectionMode = mysql_real_escape_string($xml->value);  
                mysql_query("INSERT INTO entry (entry_id,CategoryID,Name,MaxValues,MinValues,SelectionMode) VALUES ('','$CategoryID','$Name','$MaxValues','$MinValues','$SelectionMode')");                 
                $entry_id = mysql_insert_id();                   
            }                
        }     

        // It's the Value tag - Insert new values row into DB
        if ($xml->name == 'Value') {
            $xml->read();
            if (($xml->name == '#text') && $xml->hasValue) {          
                $Value = mysql_real_escape_string($xml->value);
                mysql_query("INSERT INTO values (value_id,entry_id,CategoryID,Value) VALUES ('','$entry_id','$CategoryID','$Value')");
            }
        }   
        }
    }
David D
  • 1,269
  • 17
  • 22
  • Sometimes it is easier to use `::next()` in combination with `::readString()` than just `::read()` and `->value` http://eval.in/private/8337d6efc086a3 - and there is also https://github.com/hakre/XMLReaderIterator – hakre May 12 '13 at 22:18

2 Answers2

0

You explained like

The node "MinValues" is inserting incorrectly for some rows, but not all. If this node doesn't exist in the structure, then the value should be 0. I've placed checks in the code to ensure if its missing then set to 0, but the problem remains.

And you also told that you have placed check for it, but what when you will not receive MinValues element from source?

So I would like to suggest that as you're reading your XML and when you found MaxValues and after it you found next element without MinValues then certainly put it in your writing process with `0' value.

Smile
  • 2,770
  • 4
  • 35
  • 57
  • If the case is known to you that when it bugs your insert query then had you printed the output of your insert query `"INSERT INTO entry (entry_id,CategoryID,Name,MaxValues,MinValues,SelectionMode) VALUES ('','$CategoryID','$Name','$MaxValues','$MinValues','$SelectionMode')"`? So that you will sure find the issue. – Smile May 10 '13 at 13:07
0

The answser:

It turns out NameRecommendation is repeatable, so I needed to reset MinValues on this node too. Amazing how easy things are when you know the answer.

David D
  • 1,269
  • 17
  • 22