7

I'm parsing a ton of data which I initially insert into a core data store.

At a later point, I am parsing the same XML, though some of it may have been updated. What I then do is check for an existing record with the same tag and if one already exist, I update the record with the data.

However, while my initial parsing (about 11.000 records) takes 8 seconds or so, updating seems expensive and takes 144 seconds (these are Simulator runs, so significantly longer on actual devices).

While the first time is fine (I'm showing a progress bar), the second is unacceptably long, and I would like to do something to improve the speed (even though it happens in the background on a separate thread).

Unfortunately it's not a matter of find-or-create as the data in the XML may have changed for individual records, so each could essentially need an update.

I've indexes the attributes, which sped up the initial parsing and the updating as well, but it's still slow (numbers above are with indexing). What I have noticed it that the parsing/updating seems to slow down gradually. While initially fast, it gets slower and slower as more and more records are dealt with.

So finally my question is if anything has any suggestions for me for how I could improve the speed at which I am updating my dataset? I am using MagicalRecord for fetching the record. Here's the code:

Record *record;
if (!isUpdate) {
    record = [NSEntityDescription insertNewObjectForEntityForName:@"Record" inManagedObjectContext:backgroundContext];
} else {
    NSPredicate *recordPredicate = [NSPredicate predicateWithFormat:@"SELF.tag == %@", [[node attributeForName:@"tag"] stringValue]];
    record = [Record findFirstWithPredicate:recordPredicate];
}
runmad
  • 14,846
  • 9
  • 99
  • 140
  • how often do you call save on the Context? – Jonathan Cichon May 01 '12 at 18:10
  • I call it 18 times in total. I have played around with this number and it seems to be the magic number for the overall speed. – runmad May 01 '12 at 18:16
  • If I'm reading this correctly, the first time you insert these records it is only taking about 8 seconds total. So is it naive to believe that if you delete the existing entity then insert a new one with the updated data that it'll be faster? – Tim Reddy May 01 '12 at 19:28
  • That would probably speed things up, however I'd be risking the user trying to access the data when it suddenly doesn't exist. – runmad May 01 '12 at 19:53
  • First, run an update with the Instruments, using the "Time Profiler" instrument. – Tom Harrington May 01 '12 at 20:04
  • Doing this is as expected, the fetching of each individual record is what takes a very long time (though why does it take longer and longer?) – runmad May 01 '12 at 21:26

4 Answers4

3

Instead of doing tons of fetches, do one query for each entity type and store them in a dictionary by tag, then just check the dictionary if there's an object with that key. You should be able to set the propertiesToFetch to just include the tag, and it should reduce overhead.

Senior
  • 2,259
  • 1
  • 20
  • 31
  • That would exactly be my approach, I don't understand why find-or-create wouldn't matter here, isn't it actually THE scenario where find-and-create applies? – codeclash May 01 '12 at 19:43
  • No, it isn't. Find-*or*-create matters if you need to insert a complete record that does not exist. In my case, I need to update the record if needed. So I'd both need to insert a new record if the XML feed has a new record, but for every single record already in the database, I'd need to update it's properties. – runmad May 01 '12 at 19:51
  • Let's double check: Is it true that for each item in your fresh xml, you take the "Tag", do a fetch request over all items in the core data store to check if there already exists such an item, if yes, update that one, if not, insert? Isn't it then true that you basically do a fetch request for EACH item in your fresh XML? If so, you'd definitely want to do ONLY one fetch (to get an NSDictionary (key="Tag", value=Item) of ALL existing items in the persistent store, and lookup the NSDictionary for existing ones (and update them, doing saves in batches),and insert new ones only when not found? – codeclash May 01 '12 at 21:27
  • Yes, that is true. First run = insert records, all subsequent runs = update OR insert all records. – runmad May 02 '12 at 15:03
  • @senior neither do I. One would do one fetch (NSFetchRequest the Item and just it's Tag using propertiesToFetch, and populate an NSDictionary with key=Tag, value=Item with the result of the fetch. Then, while iterating through the parsed xml items, check if the current item exists in the dictionary (lookup for Tag), if yes, update the corresponding item, if not, do an insert, and (for safety) add the inserted item to the dictionary. I don't see any reason why this wouldn't work in runmad's case... – codeclash May 02 '12 at 22:16
  • Hey guys, your solution ended up working after I sorted out an underlying issue: threading/context, with help from MagicalRecord. Initially I figured this solution would work, but it turned out since I was doing the parsing in another thread, I was having issues with the context on the main thread actually saving the background context updates, despite the notification to update firing correctly. But `saveDataInBackgroundWithBlock` solved this. – runmad May 04 '12 at 17:16
1

One thing you could try would be using a template NSPredicate so that you aren't re-parsing the format string for every find/fetch that you are doing.

So before you enter your loop:

NSPredicate *template = [NSPredicate predicateWithFormat:@"SELF.tag == $RECORD_TAG"];

inside the loop:

Record *record;
if (!isUpdate) {
    record = [NSEntityDescription insertNewObjectForEntityForName:@"Record" inManagedObjectContext:backgroundContext];
} else {
    NSPredicate *recordPredicate = [];
    record = [Record findFirstWithPredicate:[template predicateWithSubstitutionVariables:[NSDictionary dictionaryWithObject:[[node attributeForName:@"tag"] stringValue] forKey:@"RECORD_TAG"]];
}

See Apple's Predicate Programming Guide for more info.

auibrian
  • 261
  • 1
  • 2
  • Helped a tiny bit. I was actually doing this for ~98% of the objects (there are various types), and am now doing it with the remaining. Saved me about 2 seconds out of the 144 seconds so far :) – runmad May 01 '12 at 18:27
1

You could also try a combination of Senior's answer with hashing of the properties.

On insert hash the properties and store that hash as a sort of checksum property of the Record.
On update you set the fetched properties to be tag and checksum and do one fetch of all the items. Then as you iterate over your data set if the checksum differs from the one that has been fetched you can fetch that Record and update it.

auibrian
  • 261
  • 1
  • 2
1

The initial answer to ALL performance issues is to run instruments. Then, with that data, you can identify your problem areas. From there, you may have other, specific, questions about certain aspects of improving performance.

We humans are notoriously bad at identifying performance bottlenecks. So, use instruments first. It will certainly tell you where your time is being spent.

Jody Hagins
  • 27,943
  • 6
  • 58
  • 87