2

I have a datatable with 500K rows in the following format;

Int | Decimal | String

We are using a singleton pattern and ultimately our DataTable needs ending up as a List(Of AssetAllocation) where AssetAllocation is:

Public Class AssetAllocation
    Property TpId() As Integer
    Property Allocation As List(Of Sector)
End Class

Public Class Sector
    Property Description() As String
    Property Weighting As Decimal
End Class

The linq I am using;

Private Shared Function LoadAll() As List(Of AssetAllocation)

        Dim rtn = New List(Of AssetAllocation)

        Using dt = GetRawData()

            Dim dist = (From x In dt.AsEnumerable Select x!TP_ID).ToList().Distinct()

            rtn.AddRange(From i As Integer In dist
                         Select New AssetAllocation With {
                            .TpId = i,
                            .Allocation = (From b In dt.AsEnumerable
                                           Where b!TP_ID = i Select New Sector With {
                                               .Description = b!DESCRIPTION.ToString(),
                                               .Weighting = b!WEIGHT
                                           }).ToList()})
        End Using

        Return rtn
    End Function

It is taking a long time to execute the linq which is due to the inner query constructing the list of sectors. The distinct list contains 80k

Can this be improved at all?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Dooie
  • 1,649
  • 7
  • 30
  • 47

1 Answers1

1

If I've understood what you are trying to do this query should have much better performance. The trick is to use GroupBy to avoid having to search the entire table for matching id's in every iteration. I have written it in C# but I'm sure you can translate it into VB.

var rtn  = 
        dt.AsEnumerable()
        .GroupBy(x => x.Field<int>("TP_ID"))
        .Select(x => new AssetAllocation()
        { 
            TpId = x.Key, 
            Allocation = x.Select(y => new Sector
            {
                Description =  y.Field<string>("Description"),
                Weighting = y.Field<decimal>("WEIGHT") 
            }).ToList()
        }).ToList();
Magnus
  • 45,362
  • 8
  • 80
  • 118
  • Works great. Just so i understand, the group by help by not having to loop over the entire datatable for every iteration? – Dooie Nov 15 '15 at 15:44
  • 1
    @Dooie `GroupBy` build an internal HashTable where TpId is the key and the datarow is the value(s). HashTables have O(1) lookup so it is very fast to determine which TpId a row belongs to. – Magnus Nov 15 '15 at 15:55