1

I have done a lot of searching and experimenting and have been unable to find a workable resolution to this problem.

Environment/Tools

  • Visual Studio 2013
  • C#
  • Three tier web application:
    • Database tier: SQL Server 2012
    • Middle tier: Entity Framework 6.* using Database First, Web API 2.*
    • Presentation tier: MVC 5 w/Razor, Bootstrap, jQuery, etc.

Background

I am building a web application for a client that requires a strict three-tier architecture. Specifically, the presentation layer must perform all data access through a web service. The presentation layer cannot access a database directly. The application allows a small group of paid staff members to manage people, waiting lists, and the resources they are waiting for. Based on the requirements the data model/database design is entirely centered around the people (User table).

Problem

When the presentation layer requests something, say a Resource, it is related to at least one User, which in turn is related to some other table, say Roles, which are related to many more Users, which are related to many more Roles and other things. The point being that, when I query for just about anything EF wants to bring in almost the entire database.

Normally this would be okay because of EF's default lazy-load behavior, but when serializing just about any object to JSON for returning to the presentation layer, the Newtonsoft.Json serializer hangs for a long time then blows a stack error.

What I Have Tried

Here is what I have attempted so far:

  1. Set Newtonsoft's JSON serialier ReferenceLoopHandling setting to Ignore. No luck. This is not cyclic graph issue, it is just the sheer volume of data that gets brought in (there are over 20,000 Users).

  2. Clear/reset unneeded collections and set reference properties to null. This showed some promise, but I could not get around Entity Framework's desire to track everything.

    • Just setting nav properties to null/clear causes those changes to be saved back to the database on the next .SaveChanges() (NOTE: This is an assumption here, but seemed pretty sound. If anyone knows different, please speak up).

    • Detaching the entities causes EF to automatically clear ALL collections and set ALL reference properties to null, whether I wanted it to or not.

    • Using .AsNotTracking() on everything threw some exception about not allowing non-tracked entities to have navigation properties (I don't recall the exact details).

  3. Use AutoMapper to make copies of the object graph, only including related objects I specify. This approach is basically working, but in the process of (I believe) performing the auto-mapping, all of the navigation properties are accessed, causing EF to query and resolve them. In one case this leads to almost 300,000 database calls during a single request to the web service.

What I am Looking For

In short, has anyone had to tackle this problem before and come up with a working and performant solution?

Lacking that, any pointers for at least where to look for how to handle this would be greatly appreciated.

Additional Note: It occurred to me as I wrote this that I could possibly combine the second and third items above. In other words, set/clear nav properties, then automap the graph to new objects, then detach everything so it won't get saved (or perhaps wrap it in a transaction and roll it back at the end). However, if there is a more elegant solution I would rather use that.

Thanks, Dave

Dave Parker
  • 624
  • 6
  • 16
  • Don't return everything in one call. Does google display 10000000 results at once. only 10 results per page. – EZI Jul 05 '14 at 20:14
  • @EZI In cases where I am displaying a list of items, I do just that. However, as I stated above, because of the nature of the database, just grabbing even a single object can result in trying to pull in almost the entire db. – Dave Parker Jul 06 '14 at 19:16

4 Answers4

1

It is true that doing what you are asking for is very difficult and it's an architectural trap I see a lot of projects get stuck in.

Even if this problem were solveable, you'd basically end up just having a data layer which just wraps the database and destroys performance because you can't leverage SQL properly.

Instead, consider building your data access service in such a way that it returns meaningful objects containing meaningful data; that is, only the data required to perform a specific task outlined in the requirements documentation. It is true that an post is related to an account, which has many achievements, etc, etc. But usually all I want is the text and the name of the poster. And I don't want it for one post. I want it for each post in a page. Instead, write data services and methods which do things which are relevant to your application.

To clarify, it's the difference between returning a Page object containing a list of Posts which contain only a poster name and message and returning entire EF objects containing large amounts of irrelevant data such as IDs, auditing data such as creation time.

Consider the Twitter API. If it were implemented as above, performance would be abysmal with the amount of traffic Twitter gets. And most of the information returned (costing CPU time, disk activity, DB connections as they're held open longer, network bandwidth) would be completely irrelevant to what developers want to do.

Instead, the API exposes what would be useful to a developer looking to make a Twitter app. Get me the posts by this user. Get me the bio for this user. This is probably implemented as very nicely tuned SQL queries for someone as big as Twitter, but for a smaller client, EF is great as long as you don't attempt to defeat its performance features.

This additionally makes testing much easier as the smaller, more relevant data objects are far easier to mock.

jameswilddev
  • 582
  • 7
  • 16
  • Thanks for your response and I understand what you are saying. Indeed, I considered something like this. My concern is that the app is fairly large, so I will end up creating a myriad of special purpose APIs, potentially with a myriad of special classes to communicate the data over the wire. When all is said and done I may understand the use cases enough to condense the API down to a subset of common patterns, but getting there will be tedious and time consuming. My current solution is almost there. If I could just prevent EF from resolving the nav properties it would be fine. – Dave Parker Jul 05 '14 at 21:17
  • _cont'd (ran out of room)_ However, I concede that your approach may ultimately be the best path. I will see if anyone else has any other ideas before committing to that path. – Dave Parker Jul 05 '14 at 21:19
  • Ultimately the systems I've seen using this approach did work, but they performed very badly with few users and small datasets due to constant WCF calls and associated SQL calls, and had little to no test coverage. Good luck with your project. – jameswilddev Jul 05 '14 at 21:44
0

For three tier applications, especially if you are going to expose your entities "raw" in services, I would recommend that you disable Lazy Load and Proxy generation in EF. Your alternative would be to use DTO's instead of entities, so that the web services are returning a model object tailored to the service instead of the entity (as suggested by jameswilddev)

Either way will work, and has a variety of trade-offs.

If you are using EF in a multi-tier environment, I would highly recommend Julia Lerman's DbContext book (I have no affiliation): http://www.amazon.com/Programming-Entity-Framework-Julia-Lerman-ebook/dp/B007ECU7IC

There is a chapter in the book dedicated to working with DbContext in multi-tier environments (you will see the same recommendations about Lazy Load and Proxy). It also talks about how to manage inserts and updates in a multi-tier environment.

JMarsch
  • 21,484
  • 15
  • 77
  • 125
0

i had such a project which was the stressful one .... and also i needed to load large amount of data and process them from different angles and pass it to complex dashboard for charts and tables. my optimization was :

1-instead of using ef to load data i called old-school stored procedure (and for more optimization grouping stuff to reduce table as much as possible for charts. eg query returns a table that multiple charts datasets can be extracted from it)

2-more important ,instead of Newtonsoft's JSON i used fastJSON which performance was mentionable( it is really fast but not compatible with complex object. simple example may be view models that have list of models inside and may so on and on or ) better to read pros and cons of fastJSON before

https://www.codeproject.com/Articles/159450/fastJSON

3-in relational database design who is The prime suspect of this problem it might be good to create those tables which have raw data to process in (most probably for analytics) denormalized schema which save performance on querying data. also be ware of using model class from EF designer from database for reading or selecting data especially when u want serialize it(some times i think separating same schema model to two section of identical classes/models for writing and reading data in such a way that the write models has benefit of virtual collections came from foreign key and read models ignore it...i am not sure for this).

NOTE: in case of very very huge data its better go deeper and set up in-memory table OLTP for the certain table contains facts or raw data how ever in that case your table acts as none relational table like noSQL.

NOTE: for example in mssql you can use benefits of sqlCLR which let you write scripts in c#,vb..etc and call them by t-sql in other words handle data processing from database level.

4-for interactive view which needs load data i think its better to consider which information might be processed in server side and which ones can be handled by client side(some times its better to query data from client-side ... how ever you should consider that those data in client side can be accessed by user) how ever it is situation-wise.

5-in case of large raw data table in view using datatables.min.js is a good idea and also every one suggest using serverside-paging on tables.

6- in case of importing and exporting data from big files oledb is a best choice i think.

how ever still i doubt them to be exact solutions. if any body have practical solutions please mention it ;) .

Rouzbeh Zarandi
  • 1,047
  • 2
  • 16
  • 34
-1

I have fiddled with a similar problem using EF model first, and found the following solution satisfying for "One to Many" relations:

  • Include "Foreign key properties" in the sub-entities and use this for later look-up.
  • Define the get/set modifiers of any "Navigation Properties" (sub-collections) in your EF entity to private.

Data relation example (excerpt)

This will give you an object not exposing the sub-collections, and you will only get the main properties serialized. This workaround will require some restructuring of your LINQ queries, asking directly from your table of SubItems with the foreign key property as your filtering option like this:

var myFitnessClubs = context.FitnessClubs
    ?.Where(f => f.FitnessClubChainID == myFitnessClubChain.ID);

Note 1: You may off-cause choose to implement this solution partly, hence only affecting the sub-collections that you strongly do not want to serialize.

Note 2: For "Many to Many" relations, at least one of the entities needs to have a public representation of the collection. Since the relation cannot be retrieved using a single ID property.

Håkon Seljåsen
  • 589
  • 5
  • 18
  • 1
    Bad idea. You sacrifice all freedom to use navigation properties in LINQ queries only to serve a relatively minor concern, serialization. The other answers offer better, and commonly recommended, approaches. – Gert Arnold Aug 23 '17 at 14:49
  • Yes, you sacrifice some of the EF freedoms, but you will still have freedom to retrieve all information based on the desired relations. For a simple web API, the extra freedom may not be a great loss. If it is a great loss, one could embed the EF models in a separate assembly together with a public DataProvider and use the "Internal" modifier in stead. Then the queries can be done with all freedom inside the dataprovider and the mother application and serializer will not be exposed to more tables than desired in each object. – Håkon Seljåsen Aug 24 '17 at 07:16