6

I've many examples using LINQ how to divide a list into sub-list according to max items in each list. But In this case I'm interested in diving a sub-lists using sizemb as a weight - having a max total filesize per list of 9mb.

    public class doc
    {
        public string file;
        public int sizemb;
    }

    var list = new List<doc>()
    {
         new doc { file = "dok1", sizemb = 5 },
         new doc { file = "dok2", sizemb = 5 },
         new doc { file = "dok3", sizemb = 5 },
         new doc { file = "dok4", sizemb = 4 },
    };

    int maxTotalFileSize = 9;

The above list should then be divided into 3 lists. If any 'files' are more than 9mb they should be in their own list.

I made a non LINQ-version here:

        var lists = new List<List<doc>>();
        foreach (var item in list)
        {
            //Try and place the document into a sub-list
            var availableSlot = lists.FirstOrDefault(p => (p.Sum(x => x.sizemb) + item.sizemb) < maxGroupSize);
            if (availableSlot == null)
                lists.Add(new List<doc>() { item });
            else
                availableSlot.Add(item);
        }
bluee
  • 997
  • 8
  • 18
  • why should it be three lists there's only two unique values – Rune FS May 17 '13 at 07:43
  • You want to achieve list with doc1, list with doc2 and list with doc3 and doc4? – Kirill Bestemyanov May 17 '13 at 07:44
  • 1
    What is the expected result?? – Ahmed KRAIEM May 17 '13 at 07:46
  • 2
    @bluee I don't think it's obvious to find a *side effect free* linq query for that, but using a loop it's quite straightforward – vc 74 May 17 '13 at 07:50
  • Expected result is X number of lists with a sum of max maxTotalFileSize. Exception is for file which are more than maxTotalFileSize they should be in their own list. – bluee May 17 '13 at 08:03
  • You should have mentioned that your non-linq version doesn't work ;-) – Tim Schmelter May 17 '13 at 08:20
  • @TimSchmelter the non-linq version does work? In fact it does the same as your version just more compressed :) – bluee May 17 '13 at 09:29
  • @bluee: First, it's not a real non-linq version since `Enumerable.FirstOrDefault` is a linq method(it's sitting in the linq namespace). The same applies to `Enumerable.Sum`. Apart from that, as you can see [**here**](http://ideone.com/ibMsMC) it does create four lists from the single list with four docs. – Tim Schmelter May 17 '13 at 09:33
  • related: http://stackoverflow.com/questions/11463734/split-a-list-into-smaller-lists-of-n-size , http://stackoverflow.com/questions/419019/split-list-into-sublists-with-linq?rq=1 – bluee May 17 '13 at 11:05
  • @bluee: No, these are not related. At least not more than my answer on a similar question here a few hours ago: http://stackoverflow.com/questions/16604692/grouping-list-elements-to-dictionary/16604802#16604802 They all depend on the index and on the number of items in the sub-list or at the number of sub-lists(where you could use `%` on the index instead of `/`). Your question is different. You want to group by the sum of a property in each object according to it's index in the collection. This is not a task for Linq. – Tim Schmelter May 17 '13 at 11:44
  • According to your description, it seems you should use "less than or equal to" in your comparison: `<= maxGroupSize` – dharmatech May 18 '13 at 21:20

2 Answers2

7

You could use this method:

IEnumerable<IList<doc>> SplitDocumentList(IEnumerable<doc> allDocuments, int maxMB)
{
    var lists = new List<IList<doc>>();
    var list = new List<doc>();
    foreach (doc document in allDocuments)
    {
        int totalMB = list.Sum(d => d.sizemb) + document.sizemb;
        if (totalMB > maxMB)
        {
            lists.Add(list);
            list = new List<doc>();
        }
        list.Add(document);
    }
    if (list.Count > 0)
        lists.Add(list);
    return lists;
}

Here's a demo: http://ideone.com/OkXw7C

dok1
dok2
dok3,dok4
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • +1 for readability, the Sum operator could be replace by a local sum variable in case of performance issue (why did you delete your initial answer?) – vc 74 May 17 '13 at 07:56
  • I'm impressed with this fast response, but the above is not really utilizing LINQ – bluee May 17 '13 at 07:57
  • @vc74: I've deleted it temporarily since there was a bug. I've added the `...lists.Add(list);` after the loop. – Tim Schmelter May 17 '13 at 08:04
  • @bluee: No, this task should be done without linq. Imho linq approaches would either be inefficient or not readable (what is the main purpose of linq). – Tim Schmelter May 17 '13 at 08:06
  • @TimSchmelter or not side effect free... Agreed, there are cases where linq is not the best solution – vc 74 May 17 '13 at 08:08
  • I dont like that it returns IEnumerable when its clearly a list. Is there any reason behind this? – CSharpie May 18 '13 at 21:35
  • @CSharpie: That's a matter of taste and depends on the code of the caller. If the calling code needs to add other elements to the lists for example it would be better to return `IList as return type](http://stackoverflow.com/questions/381208/ienumerablet-as-return-type). By the way, all Linq methods check first if an `IEnumerable` is convertible to `IList`or `ICollection` to improve the code. So for example if you call `Count()` and it can be casted to `ICollection` it will use the `Count` property instead of a loop. – Tim Schmelter May 18 '13 at 21:44
0

You can use the Aggregate function to do that, the group by will only work when comparing values not based on an arbitrary condition of when to start a new group

list.Aggregate(new List<List<doc>>(), (acc,d) => {
          if(acc.last().Sum(x => x.sizemb) + d.sizemb > 9) {
                 acc.Add(new List<doc>());
          }
          acc.last().Add(d);
          return acc;
   }
)
Rune FS
  • 21,497
  • 7
  • 62
  • 96
  • @bluee if you tell me the error I'm sure I can fix the problem but the general idea would be the same. I guess it was simply the `}` instead of a `)`in the end – Rune FS May 18 '13 at 20:48