Why relationship must be optional when using Core Data with CloudKit?

Question

Below is one of the requirements to use Core Data with Cloudkit in Apple's doc:

All relationships must be optional. Due to operation size limitations, relationship changes may not be saved atomically.

Attempting to use an optional relationship with CloudKit results in the error:

Thread 1: Fatal error: Unresolved error Error Domain=NSCocoaErrorDomain Code=134060 "A Core Data error occurred." UserInfo={NSLocalizedFailureReason=CloudKit integration requires that all relationships be optional, the following are not: Some_Managed_Object: some_attribute}, ["NSLocalizedFailureReason": CloudKit integration requires that all relationships be optional, the following are not: Some_Managed_Object: some_attribute]

I wonder, doesn't that completely defeat the purpose of using relationship?

For example, suppose I have two entities: Account and Transfer. Since a transfer is always associated with a source account and a destination account, Transfer should have two non-optional relationships with Account. But due to the above requirement, these relationships have to be optional.

The doc gives an explanation: "(It's because) relationship changes may not be saved atomically". That seems to suggest that, during the sync between Cloudkit and Core Data, relationship may be incomplete and the incomplete relationship is exposed to App code. That seems a serious issue to me, because:

In my above example, the two relationships are non-optional by their nature. Changing them to optional makes the modal meaningless.
Even in those examples where the relationships should be optional, while incomplete relationship is syntactically correct, it may cause unexpected inconsistency issue.

So I wonder how this is supposed to work in real apps? It seems quite broken to me. Am I misunderstanding something? Could it be that using Cloudkit to sync Core Data is only applicable to a small set of apps which only use optional relationships? (If so, I wonder how the other Core Data apps sync their data among devices.)

On a related note: like many others I tried hard to search for details on the sync and conflict resolving algorithms used by Cloudkit and Core Data. The only few information I can find are:

https://developer.apple.com/forums/thread/121196

In an eventually consistent distributed system you can never "know" that you have existing data or devices in the cloud. Your application will simply "find out at some point" that this data exists and needs to be designed to handle that

https://mjtsai.com/blog/2019/06/04/syncing-core-data-with-cloudkit-and-nspersistentcloudkitcontainer/

Yup, Core Data CloudKit implements to-many relationships using CRDTs!

https://developer.apple.com/videos/play/wwdc2019/202/

Conflict resolution is implemented automatically by NSPersistentCloudKitContainer using a last writer wins merge policy.

While I roughly understand each piece of those information, they don't give direct conclusion about 1) Are data changes synced between Cloudkit and Core Data in an atomic way or not? and more importantly 2) Are incomplete data exposed to App code during the sync?

My guess is 1) No and 2) Yes. But it's hard for me to understand how to write a real app if incomplete data change are exposed to App code during the sync. Could it be that, to use Cloudkit to sync Core Data, the modal has to be designed to work fine with incomplete relationship?

I would greatly appreciate it if anyone could share how you understand it.

Dandy · Answer 1 · 2022-06-28T16:58:12.683

Could it be that, to use Cloudkit to sync Core Data, the modal has to be designed to work fine with incomplete relationship?

That is basically it — the model and code which work with the model need to meet this criteria.

When CloudKit delivers changed records from a zone, an operation is not guaranteed to contain the complete object graph in a single “delivery” (see: recordZoneFetchResultBlock) so the Core Data team decided that partial datasets are of higher priority than atomic ones (as noted). I can’t speak for them, but my assumption for this direction is due to performance and complexity reasons.

Take a device which is a new client or hasn’t been connected in awhile, requiring 1,000 records to be consumed: the delivery of that data may be broken up into 2 trips (fetch result block calls), the first containing 700 records (with its own partial transfer change token) and the second with the last 300 (and the up to date store change token). CloudKit makes no promises on complete or ordered delivery of what is needed to complete a graph in either of those trips (there are circumstances where sending the full graph in a trip might not even be possible) which would result in required relationships being unfulfilled during incremental saves (see this answer). Otherwise, Core Data would need to churn every single record from a cloud store in memory before committing anything to disk in order to properly maintain that integrity.

Unfortunately, this means your code needs to handle relationships by ensuring it is valid before accessing / doing work on it. If you need to guarantee a relationship client side because there’s no other way to decouple the object graph functionality, you might need to dive into the CloudKit framework and either build a query operation to confirm the relationship in CloudKit’s dataset or a fetch operation to handle importing that data atomically instead of relying on automatic behaviors.

Thanks. That's what I thought (though I don't know the details about CloudKit). I strongly doubt if there is a general and elegant way to check integrity of the model in app code. I considered to program with CloudKit API directly but didn't actually do it because I later decided to not use Core Data. — rayx, Jun 29 '22 at 03:00

score 1 · Answer 2 · answered Aug 10 '22 at 07:35

1

Well, CoreData is a relational database and CloudKit can be perceived as a NoSQL database. Apple is trying their best to bridge the gap. The complains about relationships, and constraints can be better understood if you understand the many design considerations of NoSQL databases that are currently popular.

Simply said, the reason is for distributed scalability and performance. Having relationships is one of the key reasons why Relational Databases can not be used in many cloud environments that needs a lot of data, and are not very "distributed" in nature.

answered Aug 10 '22 at 07:35

Phuah Yee Keat

1,572
1
17
17

1

I think what you said makes sense. What I don't understand is, if the two technologies are not supposed to work together, what's the point of mixing them? It seems to cause more confusion than being helpful. For most apps which store data on server side, they don't care about this. For the rest apps (most of them are written by individual developers, I think) they are in a dilemma - using Apple's technology which may or may not work vs inventing one's own solution. I chose the latter. – rayx Sep 17 '22 at 01:08

rayx · Answer 3 · 2021-02-28T02:51:56.850

The more I think about it, the more I believe:

Data changes are synced between Cloudkit and Core Data in non atomic way.
The incomplete states during data sync are exposed to App code.
These behavior are due to the way how sync is performed and can hardly be worked around.

So Cloudkit's built-in sync support for Core Data is only useful for a small set of simple apps that don't require data integrity.

For serious apps, one needs to think about implementing a custom approach by using Cloudkit directly. But writing one's own sync algorithm isn't an easy task and is full of pitfalls.

hidden-username · Answer 4 · 2021-03-15T14:02:18.557

I have also struggled with this and have come up with some solutions.

Don't use relationships and keep your model shallow ( not ideal or scalable)

For obvious reasons this is not ideal or scalable, but in one of my apps I store PKDrawing data, directly on an Event entity with other drawing related stuff rather than using a relationship. This really is fighting the CoreData framework though and is bad design.

Check relationship exists during fetch.

This is probably the best solution for user created data. Lets say you have a Sketch with a to-one Canvas. When fetching your Sketches to display in a List, only fetch Sketches with a non-nil canvas relationship.

Example of checking relationship

Provide default values

This works for things that aren't user created. For example in the above example, Canvas could also have to-one relationship with PaperTemplate. PaperTemplate stores things like PaperStyle (grid, lined) . Since this data can easily be recreated in the PaperSettingsView (through a picker), we can can simply revert to a DefaultValue in awakeFromFetch if the relationship is nil. Note: I am not sure, but this might result in orphaned PaperTemplate entities.

Ultimately I think solution #2 is the best all-around solution. If we only fetch objects with non-nil relationships, we can ensure the model is correct. So you would only fetch Transfers with both non-nil source account and destination account. If this is done using a NSFetchedResultsController or a SwiftUI @FetchRequest, your view can stay synced as objects become "valid". While the saving might not be atomic, clients can decide how to consume changes and mimic atomicity, by ignoring incomplete objects.

Edit:

While I think doing this is fighting against Core Data. You can store blobs using Transformable or Codable structs that you encode/decode manually. Make sure to check "Allows External Storage".

So you you could use:

class Transfer: NSManagedObject {
   var sourceAccountData: Data?
   var destinationAccountData: Data?
}

// Could use class and NSSecureCoding instead if you wanted, but I like structs.
struct Account: Codable {
}

Thanks for your information. However, your solution #2 and #3 are not feasible for apps which require data integrity (e.g. financial apps). — rayx, Mar 13 '21 at 02:12
@rayx I agree that it sucks. #1 is actually the approach that I seem to use the most but it does feel wrong. I find myself storing blobs, when I need the data to exist. I'll add an edit with more details. — hidden-username, Mar 15 '21 at 13:51
Thanks. I see what you meant. You packed all entities involved in a relationship in a single entry (a blob). — rayx, Mar 15 '21 at 14:31
Hi, @hidden-username, I hope you have been doing fine. I appreciated your detailed suggestions above and I'd like to let you know what my final solution is. I gave up Core Data. I use Codable to persist my data in a JSON file instead and use Airdrop to sync (send actually) data between different devices (or even different users). That works well for me. — rayx, Jun 29 '22 at 02:51

score 0 · Answer 5 · answered Sep 18 '21 at 15:52

0

My two cents, solution two is the correct solution and should be done anyway for any robust app. You should always validate data! Imagine you download data from json from a server wouldn't you validate that data is correct before importing? It is the same thing just kind of a different twist.

answered Sep 18 '21 at 15:52

John

61
1
2

This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/29860069) – Flair Sep 19 '21 at 00:13

Why relationship must be optional when using Core Data with CloudKit?

5 Answers5

Edit: