Is doing a fetch request in validateForInsert overly expensive

Question

I've recently did a refactor pass in my core data model, and i'm using the multi-tiered managed object context model from here: http://www.cocoanetics.com/2012/07/multi-context-coredata/.

I've been successful at isolating all of my core data parsing so that new managed objects are being parsed and inserted inside a child MOC on a background thread, and those changes are eventually batch saved to the parent/main MOC, then eventually written to the persistent store coordinator through its parent/writer MOC.

This has slightly noticeably improved my UI responsiveness, as previously the large batch writes were being done on the parent/main MOC, and locking up the UI thread.

I want to further improve our object insertion and validation. Each time the app opens up, and in a somewhat regular interval, there is a profile request during which tens or hundreds of objects are sent down with new values. I've chosen to simply create NSManagedObjects for all of these objects, insert them into the child MOC, and allow validation to take care of removing duplicates.

My question is whether performing a NSFetchRequest in every call to validateForInsert: for the NSManagedObject is expensive. I've seen several answers on StackOverflow that seem to be using this pattern, for example: https://stackoverflow.com/a/2245699/2184893. I want to do this one instead of validating before the entity is created because if two threads are concurrently creating the same object at the same time, both will get created, and the validation must happen at insert/merge time on the parent thread.

So, is using this method expensive? Is it common practice? Also, is there a difference in using validateForInsert and validate?

-(BOOL)validateUniqueField:(id *)ioValue error:(NSError * __autoreleasing *)outError{

    // The property being validated must not already exist

    NSFetchRequest *fetchRequest = [NSFetchRequest fetchRequestWithEntityName:NSStringFromClass([self class])];
    fetchRequest.predicate = [NSPredicate predicateWithFormat:@"uniqueField == %@", *ioValue];

    int count = [self.managedObjectContext countForFetchRequest:fetchRequest error:nil];
    if (count > 0) {
        if (outError != NULL) {
            NSString *errorString = NSLocalizedString(
                                                      @"Object must have unique value for property",
                                                      @"validation: nonunique property");
            NSDictionary *userInfoDict = @{ NSLocalizedDescriptionKey : errorString };
            *outError = [[NSError alloc] initWithDomain:nil
                                                   code:0
                                               userInfo:userInfoDict];
        }
        return NO;
    }
    return YES;
}

My use case for multiple threads potentially creating the same object would be for example if i am asynchronously requesting all users within an area, two of these areas overlap and give me the same user object at about the same time, and each thread tries to create the same user in its own context. findOrCreate wouldn't be able to validate that the object was already being created in a different thread/context. Currently i'm handling this by doing the check in validateForInsert.

Why don't you check for an existing object and update it? And why would you have multiple threads creating the same thing at the same time? If you're pulling data down from the web and mapping it into Core Data then there are frameworks that can help you, like RestKit. — Wain, Sep 20 '14 at 11:28
Performing fetches will alter the object graph, which you should not do from inside a validation method. This is a very, very bad idea. The preferred way to accomplish what you are trying to do is to implement the find-or-create pattern, and not during validation. — quellish, Sep 20 '14 at 22:30
say, for example, i make various asynchronous calls that request data and process it. those calls could potentially have results that end up becoming the same objects. of two calls happen to be processing at the same time, and each thread ends up creating the same object at about the same time before it is merged onto the main context, we would end up with two objects referring to the same actual object. Currently I catch this at the validation level and discard them. I can't think of a way that would allow two unmerged contexts to be aware of the same object being created in this situation. — mitrenegade, Sep 21 '14 at 00:32
I would also be open to the suggestion that objects being created should only be done on a single, queued background thread, if that is a valid idea. say all of my asynchronous calls are returned and their objects get queued and preprocessed. i'm not sure if this is great because i could potentially have many multiple parallel requests and i dont want to essentially turn them into a serial request. — mitrenegade, Sep 21 '14 at 00:33

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

Fetching inside a validation method?

Your question is clever in that it's hiding several questions!

So, is using this method expensive?

It's potentially very expensive, as you're incurring a fetch for at least each object you are validating as part of a save (validation is called automatically during a save).

Is it common practice?

I would really hope not! I've only seen this done once before, and it didn't turn out well (keep reading).

Also, is there a difference in using validateForInsert and validate?

I'm not sure what you mean here. A managed object has these validation methods: validateForInsert, validateForUpdate, validateForDelete. Each of these executes it's own rules as well as calling validateValue:forKey:error: for individual properties, which in turn will call any implementations of the pattern validate<Key>:error:. validateForInsert, for example, will execute any insertion validation rules defined in the managed object model before calling other validation methods (for example, marking a modeled property non-optional in the model editor is an insert validation rule). While validation is called automatically when the context is saved, you can call it at any time. This can be useful if you want to show the user errors that must be corrected for a save to complete, etc.

That said, read on for a solution to the problem you seem to be trying to solve.

About fetching inside a validation method...

It's unwise to access the object graph inside a validation method. When you perform a fetch, you are changing the object graph in that context - objects get accessed, faults are fired, etc. Validation happens automatically during a save, and altering the in-memory object graph at that point - even if you are not changing property values directly - can have some dramatic and hard to predict side effects. It would not be happy fun time.

The correct solution to uniqueness: Find Or Create

You seem to be trying to assure that managed objects are unique. Core Data provides no built in mechanism for this, but it there is a recommended pattern to implement: "find-or-create". This is done when accessing objects, rather than when validating or saving them.

Determine what makes this entity unique. This may be a single property value (in your case it appears to be a single property), or a combination of several (for example "firstName" and "lastName" together are what makes a "person" unique). Based on that uniqueness criteria, you query the context for an existing object match. If matches are found, return them, otherwise create an object with those values.

Here is an example based on the code in your question. This will use "uniqueField"'s value as the uniqueness criteria, obviously if you have multiple properties that together make your entity unique this gets a little more complicated.

Example:

// I am using NSValue here, as your example doesn't indicate a type.
+ (void) findOrCreateWithUniqueValue:(NSValue *)value inManagedObjectContext:(NSManagedObjectContext *)managedObjectContext completion:(void (^)(NSArray *results, NSError *error))completion {

    [managedObjectContext performBlock:^{
        NSError             *error      = nil;
        NSEntityDescription *entity     = [NSEntityDescription entityForName:NSStringFromClass(self) inManagedObjectContext:managedObjectContext];
        NSFetchRequest *fetchRequest    = [[NSFetchRequest alloc] init];
        fetchRequest.entity = entity;
        fetchRequest.predicate = [NSPredicate predicateWithFormat:@"uniqueField == %@", value];

        NSArray *results = [managedObjectContext executeFetchRequest:fetchRequest error:&error];
        if ([results count] == 0){
            // No matches found, create a new object
            NSManagedObject *object = [NSEntityDescription insertNewObjectForEntityForName:[entity name] inManagedObjectContext:managedObjectContext];
            object.uniqueField = value;
            results = [NSArray arrayWithObject:object];
        }

        completion(results, error);
    }];

}

This would become your primary method for getting objects. In the scenario you describe in your question, you're periodically getting data from some source that must be applied to managed objects. Using the above method, that process would look something like....

[MyEntityClass findOrCreateWithUniqueValue:value completion:^(NSArray *results, NSError *error){
    if ([results count] > 0){
        for (NSManagedObject *object in results){
            // Set your new values.
            object.someValue = newValue;
        }
    } else {
        // No results, check the error and handle here!
    }
}];

Which can be done efficiently, performantly, and with appropriate data integrity. You can use batch faulting in your fetch implementation, etc. if you are willing to take the memory hit. Once you have performed the above for all of your incoming data, the context can be saved and the objects and their values will be pushed to the parent store efficiently.

This is the preferred way to implement uniqueness using Core Data. This is mentioned very briefly, and indirectly, in the Core Data Programming Guide.

To expand on this... It's not unusual to have to do "bulk" find-or-create. In your scenario, you're getting a list of updates that need to be applied to your managed objects, creating new objects if they do not exist. Obviously, the example find-or-create method above can do this, but you can also do it much more efficiently.

Core Data has the concept of "batch faulting". Instead of faulting each object individually as it's accessed, if you know you're going to be using several objects they can be batched all at once. This means less trips to the disk, and better performance.

A bulk find or create method can take advantage of this. Be aware that since all of these objects will now have their faults "fired", this will use more memory - but not more than if you were calling the above single find-or-create on each.

Rather than repeating all of the previous method, I will paraphrase:

 // 'values' is a collection of your unique identifiers.
+ (void) findOrCreateWithUniqueValues:(id <NSFastEnumeration>)values inManagedObjectContext:(NSManagedObjectContext *)managedObjectContext completion:(void (^)(NSArray *results, NSError *error))completion {
    ...
    // Effective use of IN will ensure a batch fault
    fetchRequest.predicate = [NSPredicate predicateWithFormat:@"SELF.uniqueField IN %@", values];
    // returnsObjectsAsFaults works inconsistently across versions.
    fetchRequest.returnsObjectsAsFaults = NO;
    ...
    NSArray *results = [managedObjectContext executeFetchRequest:fetchRequest error:&error];
    // uniqueField values we initially wanted
    NSSet   *wanted = [NSSet setWithArray:values];
    // uniqueField values we got from the fetch
    NSMutableSet    *got    = [NSMutableSet setWithArray:[results valueForKeyPath:@"uniqueField"]];
    // uniqueField values we will need to create, the different between want and got
    NSMutableSet    *need   = nil;

    if ([got count]> 0){
        need = [NSMutableSet setWithSet:wanted];
        [need minusSet:got];
    }

    NSMutableSet *resultSet = [NSMutableSet setWithArray:fetchedResults];
    // At this point, walk the values in need, insert new objects and set uniqueField values, add to resultSet
    ...
    // And then pass [resultSet allObjects] to the completion block.

}

Effective use of batch faulting can be a huge boost for any application that deals with many objects at a time. As always, profile with instruments. Unfortunately faulting behavior has varied significantly between different Core Data releases. In older releases, an additional fetch using managed object IDs was even more beneficial. Your mileage may vary.

note: core data now provides built-in uniqueness constraints — malhal, Jul 24 '16 at 19:40

score 0 · Answer 2 · answered Sep 20 '14 at 22:20

Individual calls to the DB are expensive compared to a single call that compares against a set of identifiers. As you are comparing against a single value, you can do a compare against a group of values using the in operator on a set or array. Hence, bring down the lot, extract the ids using, probably, -valueForKey: and rewrite the above to accept an array of values.

score 0 · Answer 3 · edited Feb 08 '17 at 00:22

0

I think its fine to do a fetch inside the validation methods, e.g. validateForInsert. In fact it's the only way you can pass an error back to the context save if the required fetch fails. Just make sure you pass the error param into your fetch and return false if the fetch gave a nil result.

edited Feb 08 '17 at 00:22

Caleb Kleveter

11,170
8
62
92

answered Jul 24 '16 at 19:46

malhal

26,330
7
115
133