I have a large XML document (~10 mb) that contains data. I need to parse this XML and go through my database to update it. There are 3 cases:
- New data is present in the XML since last time - insert it into the DB
- An item has changed - update it in the DB
- An item has been removed from the XML since last time - mark it as disabled in the DB
If data for an item hasn't changed since the last time I checked the XML, take no action. Each item in the XML has its own ID so I can retrieve it from the database easily.
It's really slow to iterate through the items like this though, but I need the relational aspects of a database (the XML represent items that users can buy, so I can't generate new IDs for the items and just bulk load them).
What would be the best way of implementing this since looping through the XML and looking up each item individually is really slow.
This is done using PHP into a MySQL database.
== edit ==
The basic loop of this process is as follows:
while ($data = get_next_bit_of_data_from_xml())
{
// this next line is the slow part
if ($object = get_object_from_database($data['id']))
{
// update $object
// ...
// persist
$object->save();
}
// build new object and persist
// ...
}
The reason the script is so slow is because I can't use any kind of bulk operations like LOAD DATA INFILE because for each entry in the XML I need to check whether the object already exists.
So I'm really wondering if anyone can suggest another approach that will allow me to bulk load data while maintaining referential integrity with existing data.