-4

I am working on C# API's which gets data from database. I inserted around 19500 records in database for testing performance. Here, sampledData contains 19500 records.

var sampledData = await _dataContext.ItemsData
                                    .Include(i => i.ProcedureItem)
                                    .Include(i => i.ProcedureItem.ProcedureItem)
                                    .Include(i => i.ProcedureItem.ProcedureItemAll)
                                    .Where(i => i.Procedure.Status == true &&
                                                i.isValid== true &&
                                                i.Procedure.ID== ID).ToListAsync();

// Foreach loop on 19500 records to filter data and store them info list
var filteredList = new List<ProcedureFilteredData>();

foreach(var s in sampledData )
{
    if (filteredList.Any(i => i.ProcedureItem == s.ProcedureItem.ProcedureItem.Name))
    {
        continue;
    }

    var pData = new ProcedureFilteredData
    {
        ProcedureItemAll = s.ProcedureItem.Name
    };

    filteredList.Add(pData);
}

In filteredList, I get 1404 records. It takes time in foreach loop to filter data. Is there any way to optimize performance?

Abhi Singh
  • 321
  • 3
  • 14
  • 1
    You seem to select a lot of columns from database, do you need them all? You use only two in this code. – Antonín Lejsek Jan 11 '21 at 06:53
  • 1
    Your code is inaccurate in several ways with several mistakes. Please spend time to validate your code before you paste a question – TheGeneral Jan 11 '21 at 06:53
  • 1
    My money is on the O(n) `.Any( )` over the `foreach` being slow. – ProgrammingLlama Jan 11 '21 at 07:08
  • Where does `s.` come from? – Rick James Jan 11 '21 at 07:31
  • 2
    Can you move the filtering into the SQL? – Rick James Jan 11 '21 at 07:32
  • 20K rows is no data at all. Instead of loading the data in memory, write a proper LINQ query that retrieves only the data you need. Even 1M rows is small data for a database if proper indexes are used – Panagiotis Kanavos Jan 12 '21 at 10:44
  • What is this loop trying to do? Retrieve the first *random* record by name? Without an `OrderBy` clause there's no implicit order. Or retrieve distinct product names? You don't need the loop and the `Includes`, just a `Select().Distinct()` in the LINQ query – Panagiotis Kanavos Jan 12 '21 at 10:54
  • All this code could be replaced with `await _dataContext.ItemsData.SelectMany(i=>i.ProcedureItem).Where(...).Select(p=>p.Name).Distinct().ToListAsync()`, or `(from item in _dataContext.ItemsData from pitem in item.ProcedureItem where ... select pitem.Name).Distinct()..` – Panagiotis Kanavos Jan 12 '21 at 10:59

2 Answers2

6

I suggest checking with a help of HashSet<T>, not List<T> since filteredList .Any(...) has quadratic (O(n**2)) time complexity while !knownNames.Add(...) is linear (O(n)):

HashSet<string> knownNames = new HashSet<string>();

var filteredList = new List<ProcedureFilteredData>();

foreach (var s in sampledData)
{
    if (!knownNames.Add(s.ProcedureItem.ProcedureItem.Name))
        continue;  

    var pData = new ProcedureFilteredData
    {
        ProcedureItemAll = s.ProcedureItem.Name
    };

    filteredList.Add(pData);
}
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • Your complexity figures could be confusing. `Any` in itself is linear, but using it in a loop will cause the quadratic complexity. – JonasH Jan 11 '21 at 08:37
  • hello, I need optimization for Foreach loop. filteredList data is looped and there are certain condition which I put on filteredList to filter data. One of the condition I already mentioned in above. My loop run for 19500 times for 19500 records. – Abhi Singh Jan 11 '21 at 08:55
  • @AbhiSingh you need to post a good question, not hide the details behind non-existent privacy issues. 20K rows is no data at all. What are you trying to do? The loop looks like an attempt to retrieve the first record by name. You posted another question saying you want to join the results to another table, then deleted it due to privacy issues - that don't exist – Panagiotis Kanavos Jan 12 '21 at 10:48
  • @AbhiSingh if you want to retrieve distinct records, you can add a `Distinct` call to your LINQ query. If you want to retrieve the first item in a certain set, you can use eg `ROW_NUMBER() OVER(PARTITION BY)` in SQL. We have to know the problem to give an answer – Panagiotis Kanavos Jan 12 '21 at 10:50
  • @DmitryBychenko notice that the question's code in the end just retrieves unique product item names. A `Select().Distinct()` in the LINQ query would do the job. That's not what the OP really wants though, there was another similar question asking to join this list with another table. When people asked to see what the entities are, the question was deleted – Panagiotis Kanavos Jan 12 '21 at 10:55
-1

The most efficient way to achieve this is to perform the filtering on the query.

For example:

var sampledData = await _dataContext.ItemsData
                                    .Include(i => i.ProcedureItem)
                                    .Include(i => i.ProcedureItem.ProcedureItem)
                                    .Include(i => i.ProcedureItem.ProcedureItemAll)
                                    .Where(i => i.Procedure.Status == true &&
                                                i.isValid== true &&
                                                i.Procedure.ID== ID)
                                    .WhereIf(isFilter, //predicate);

The example does not fully explain your situation, I just wrote it to give an idea. You should change the example I have given according to your own. For example, in your case, you may need to use Select.

berkansasmaz
  • 608
  • 5
  • 16
  • 1
    Adding the predicate after `ToListAsync()` means EF will still bring all 19k+ rows into memory, where the filtering will then happen. The potential in-memory gains here are at best marginal compared to the on-database options. Also `.WhereIf()` is not part of standard NET/LINQ. – sellotape Jan 11 '21 at 07:40
  • Yes, I forgot to remove the executer, sorry for my mistake. But what I want to explain is actually explained more clearly [here](https://stackoverflow.com/a/35542032/9922629). – berkansasmaz Jan 11 '21 at 07:58