RE: CRUD operations. Is it pulling more data than is needed a bad thing?

Question

RE: CRUD operations... Is it pulling more data than is needed a bad thing?

Let me preface this with saying I really did search for this answer. On and off for some time now. I'm certain it's been asked/answered before but I can't seem to find it. Most articles seem to be geared towards how to perform basic CRUD operations. I'm really wanting to get deeper into best practices. Having said that, here's an example model I mocked up for example purposes.

public class Book
{
    public long Id { get; set; }
    public string Name { get; set; }
    public decimal AverageRating { get; set; }
    public decimal ArPoints { get; set; }
    public decimal BookLevel { get; set; }
    public string Isbn { get; set; }
    public DateTime CreatedAt { get; set; }
    public DateTime PublishedAt { get; set; }

    public Author Author { get; set; }
    public IEnumerable<Genre> Genres { get; set; }
}

I'm using ServiceStack's OrmLite, migrating string queries to object model binding wherever possible. It's a C# MVC.NET project, using Controller/Service/Repository layers with DI. My biggest problem is with Read and Update operations. Take Reads for example. Here are two methods (only wrote what I thought was germane) for example purposes.

public class BookRepository
{
    public Book Single(long id)
    {
        return _db.SelectById<Book>(id);
    }

    public IEnumerable<Book> List()
    {
        return _db.Select<Book>();
    }
}

Regardless of how this would need to change for the real world, the problem is simply that to much information is returned. Say if I were displaying a list of books to the user. Even if the List method were written so that it didn't pull nested methods (Author & Genres), it would have data for properties that were not used.

It seems like I could either learn to live with getting data I don't need or write a bunch of extra methods that changes what properties are pulled. Using the Single method, here's a few examples...

public Book SinglePublic(long id): Returns a few properties
public Book SingleSubscribed(long id): Returns most properties
public Book SingleAdmin(long id): Returns all properties

Having to write out methods like this for most tables doesn't seem very maintainable to me. But then, almost always getting unused information on most calls has to affect performance, right? I have to be missing something. Any help would be GREATLY appreciated. Feel free to just share a link, give me a PluralSight video to watch, recommend a book, whatever. I'm open to anything. Thank you.

Yes. Fetching more data than needed can drastic change for the worse the query plan of the database for retrieving the data, so an increase in latency, increase network bandwidth and potentially addition disk IO. — danblack, Aug 27 '20 at 23:00
I'm not familiar with OrmLite, but it looks like they do something similar to what was mentioned in the answer from Jonathan: https://stackoverflow.com/questions/37442401/ormlite-query-to-select-some-of-the-columns-from-each-of-2-joined-tables — Matt U, Aug 27 '20 at 23:46

score 2 · Accepted Answer · answered Aug 28 '20 at 09:15

As a general rule you should avoid pre-mature optimization and always start with the simplest & most productive solution first as avoiding complexity & large code-base sizes should be your first priority.

If you're only fetching a single row, you should definitely start by only using a single API and fetch the full Book entity, I'll personally also avoid the Repository abstraction which I view as an additional unnecessary abstraction, so I'd just be using OrmLite APIs directly in your Controller or Service, e.g:

Book book = db.SingleById<Book>(id);

You're definitely not going to notice the additional unused fields over the I/O cost of the RDBMS network call and the latency & bandwidth between your App and your RDBMS is much greater than additional info on the wire over the Internet. Having multiple APIs for the sake of reducing unused fields adds unnecessary complexity, increases code-base size / technical debt, reduces reusability, cacheability & refactorability of your code.

Times when to consider multiple DB calls for a single entity:

You've received feedback & given a task to improve the performance of a page/service
Your entity contains large blobbed text or binary fields like images

The first speaks to avoiding pre-mature optimization by first focusing on simplicity & productivity before optimizing to resolve known realizable performance issues. In that case first profile the code, then if it shows the issue is with the DB query you can optimize for only returning the data that's necessary for that API/page.

To improve performance I'd typically first evaluate whether caching is viable as it's typically the least effort / max value solution where you can easily cache APIs with a [CacheResponse] attribute which will cache the optimal API output for the specified duration or you can take advantage of caching primitives in HTTP to avoid needing to return any non-modified resources over the wire.

To avoid the second issue of having different queries without large blobbed data, I would extract it out into a different 1:1 row & only retrieve it when it's needed as large row sizes hurts overall performance in accessing that table.

Custom Results for Summary Data

So it's very rare that I'd have different APIs for accessing different fields of a single entity (more likely due to additional joins) but for returning multiple results of the same entity I would have a different optimized view with just the data required. This existing answer shows some ways to retrieve custom resultsets with OrmLite (See also Dynamic Result Sets in OrmLite docs).

I'll generally prefer to use a custom Typed POCO with just the fields I want the RDBMS to return, e.g. in a summary BookResult Entity:

var q = db.From<Book>()
  .Where(x => ...);
var results = db.Select<BookResult>(q);

This is all relative to the task at hand, e.g. the fewer results returned or fewer concurrent users accessing the Page/API the less likely you should be to use multiple optimized queries whereas for public APIs with 1000's of concurrent users of frequently accessed features I'd definitely be looking to profiling frequently & optimizing every query. Although these cases would typically be made clear from stakeholders who'd maintain "performance is a feature" as a primary objective & allocate time & resources accordingly.

Jonathan · Answer 2 · 2020-08-27T23:10:07.083

I can't speak to ORM Lite, but for Entity Framework the ORM will look ahead, and only return columns that are necessary to fulfill subsequent execution. If you couple this with view models, you are in a pretty good spot. So, for example, lets say you have a grid to display the titles of your books. You only need a subset of columns from the database to do so. You could create a view model like this:

public class BookListViewItem{
  public int Id {get;set;}
  public string Title {get; set;}

  public BookListView(Book book){
    Id = book.Id;
    Title = book.Title;
  }
}

And then, when you need it, fill it like this:

var viewModel = dbcontext.Books
                         .Where(i => i.whateverFilter)
                         .Select(i => new BookListViewItem(i))
                         .ToList();

That should limit the generated SQL to only request the id and title columns.

In Entity Framework, this is called 'projection'. See:

https://social.technet.microsoft.com/wiki/contents/articles/53881.entity-framework-core-3-projections.aspx

RE: CRUD operations. Is it pulling more data than is needed a bad thing?

2 Answers2

Custom Results for Summary Data