2

I have a DataTable with 8 fields. I want to return another DataTable that groups the first 7 fields and has the max value of the 8th. All fields are strings, the 8th is a number stored in a string field, e.g. "24".

Code that I've tried so far:

public DataTable highestVersion(DataTable dt)
        {
            DataTable dtResult=dt.Clone();

            var query=from dtRow in dt.AsEnumerable()
                      group dtRow by new
                      {
                          b_r = dtRow["r"],
                          b_1 = dtRow["b_1"],
                          b_2 = dtRow["b_2"],
                          p_r = dtRow["p_r"],
                          p_1 = dtRow["p_1"],
                          p_2 = dtRow["p_2"],
                          p_f = dtRow["p_f"]
                      }
                          into maxVersion
                          select maxVersion.OrderByDescending(a => a["p_v"]).First();

            foreach (var result in query)
            {
                 dtResult.ImportRow(result);
            }
            return dtResult;
        }

My expectation is that the group dtRow by new{} should group by the first 7 and get only the highest element p_v of each grouping by calling OrderByDescending().First(). But this doesn't seem to actually do anything; all of the input rows are being returned.

EDIT: I just realized what the problem is. The values in p_f are different from one another, e.g.

datarow 1
--------
r: "abc"
b_1: "def"
b_2: "ghi"
p_r: "jkl"
p_1: "mno"
p_2: "pqr"
p_f: "stu_this"
p_v: "18"

datarow 2
--------
r: "abc"
b_1: "def"
b_2: "ghi"
p_r: "jkl"
p_1: "mno"
p_2: "pqr"
p_f: "stu_that"
p_v: "24"

In this case I would want to only return DataRow 2, because 24>18, and be able to retrieve the stu_that value.

sigil
  • 9,370
  • 40
  • 119
  • 199
  • Just to clarify, you want only the max value of r, b_1, b_2, p_r, p_1, p_2, and p_f, and you want to use that as sort to order the rows? – Josh C. Dec 18 '12 at 02:47
  • sigil, why do you think it does not work? It does! – horgh Dec 18 '12 at 02:50

1 Answers1

4

Updated the answer in respond to comments:

var res = dt.AsEnumerable()
            .GroupBy(dtRow => new
            {
                b_r = dtRow["r"],
                b_1 = dtRow["b_1"],
                b_2 = dtRow["b_2"],
                p_r = dtRow["p_r"],
                p_1 = dtRow["p_1"],
                p_2 = dtRow["p_2"]
            })
            .Select(g => new
            {
                Group = g,
                Max = g.Max(r => r["p_v"])
            })
            .Select(g => new
            {
                Key = g.Group.Key,
                Max = g.Max,
                Values = g.Group
                          .Where(r => r["p_v"].Equals(g.Max))
                          .Select(r => r["p_f"])
            });

I guess it could be optimized to some extent, but at least this should do the job, if all the conditions are known now, of course.

horgh
  • 17,918
  • 22
  • 68
  • 123
  • I didn't include enough information in my original post. I added the necessary info in an edit. – sigil Dec 18 '12 at 03:00
  • @sigil isn't it expected to be grouped by all fields except `p_f` too? – horgh Dec 18 '12 at 03:01
  • I still insist that your code does the job...just remove `p_f` from the `group by` clause. – horgh Dec 18 '12 at 03:03
  • @sigil, how would you reflect to a situation, when several row have equal `p_v` and different `p_f`? – horgh Dec 18 '12 at 03:05
  • @konstantin yes, but i don't want to order by `p_f`, just `p_v`. I would exclude `p_f` from the `GroupBy` but I want to be able to retrieve its value. If there are multiple `p_f` with the same max value of `p_v` i will take all of those `p_f`. I realize i didn't give a complete question at the beginning, should I post it as a new question? – sigil Dec 18 '12 at 03:05
  • @dbaseman I do think that extension syntax is much easier and cosy :) – horgh Dec 18 '12 at 03:06
  • @sigil, well, let me think) – horgh Dec 18 '12 at 03:07