What's The Best Way To Remove Duplicate Rows Based On A Row Item In A DataTable With VB.Net?

Question

I can't figure out how to do this after reading multiple posts on the internet and I would just like to make a very clear post below about this problem.

I will remove the post if I am pointed to a article with the same question that helps me.

I have an example data table below. I would like to remove duplicate rows but only where the "Request Type" is "Cancel Order" and the "Order Numbers" are the same.

Report Date Time, Order Number, Request Type, Old Value, New Value
12/5/2019 12:00 , TM123456-01 , Cancel Order, 470000000, 5700000000
12/5/2019 12:00 , TM123456-01 , Cancel Order , 123000000, 4560000000
12/5/2019 12:00 , MT123456-02 , Add Order    , 470000000, 5700000000
12/5/2019 12:00 , AP123456-02 , Add Order    , 470000000, 5700000000
12/5/2019 12:00 , ST123456-02 , Remove Order , 470000000, 5700000000

So the row that should be removed from the above data table should be row 2 because it has the same "Order Number" as row 1 which is "TM123456-01" and it's request type was "Cancel Order".

The result I would like to be left with would be the data table below.

Report Date Time, Order Number, Request Type, Old Value, New Value
12/5/2019 12:00 , TM123456-01 , Cancel Order, 470000000, 5700000000
12/5/2019 12:00 , MT123456-02 , Add Order    , 470000000, 5700000000
12/5/2019 12:00 , AP123456-02 , Add Order    , 470000000, 5700000000
12/5/2019 12:00 , ST123456-02 , Remove Order , 470000000, 5700000000

I know that I could probably do this with a nested For Each loop but I would like to learn how to do this in a more elegant way with Microsoft LINQ, if possible, or another method.

Why would you remove the second Row and not the first one (base on what considerations, I mean)? — Jimi, Dec 05 '19 at 18:15
It could be the first one, it doesn't matter which in my case. Only that I end up with a distinct set of rows with respect to the "Order Number" and "Request Type" where request type is "Cancel Order". — Mark, Dec 05 '19 at 18:25
The second row has the same order number as row 1 which is "TM123456-01" and it also has a "Request Type" of "Cancel Order" so since all of that is the same as row 1, I would want it removed from our data table. If you notice, the old and new values are different though. The old and new values matter for other "Request Types" in the data table but not for "Cancel Order" request types. Hope that makes sense. — Mark, Dec 05 '19 at 18:28

Matt.G · Answer 1 · 2019-12-05T22:03:38.490

1

C# Version:

var result = orders.GroupBy(x => new { x.OrderNumber, x.RequestType})
            .SelectMany(x => x.Key.RequestType=="Cancel Order" ? x.Take(1) : x.ToList());

edited Dec 05 '19 at 22:03

answered Dec 05 '19 at 19:14

Matt.G

3,586
2
10
23

This doesn't work with `DataTable` and doesn't return `DataTable`. – NetMage Dec 06 '19 at 01:15
@NetMage, didn't notice the requirement for datatable. – Matt.G Dec 06 '19 at 14:01

NetMage · Accepted Answer · 2019-12-06T01:15:03.407

Assuming you want to delete the rows from the original table, and not create a new table, you can use LINQ to find the rows to delete, then delete them. LINQ is for querying data, not modifying it.

Dim indicesOfRowsToDelete = dt.AsEnumerable _
                              .Select(Function(r, n) New With { Key r, Key n }) _
                              .GroupBy(Function(rn) New With { Key .OrderNumber = rn.r.Field(Of String)("OrderNumber"), Key .RequestType = rn.r.Field(Of String)("RequestType") }) _
                              .Where(Function(rg) rg.Key.RequestType = "Cancel Order") _
                              .SelectMany(Function(rg) rg.Skip(1).Select(Function(rn) rn.n)) _
                              .OrderByDescending(Function(n) n)

For Each n In indicesOfRowsToDelete
    dt.Rows(n).Delete
Next

Here is the C# version of the same code:

var indicesOfRowsToDelete = dt.AsEnumerable()
                              .Select((r, n) => new { r, n })
                              .GroupBy(rn => new { OrderNumber = rn.r.Field<string>("OrderNumber"), RequestType = rn.r.Field<string>("RequestType") })
                              .Where(rg => rg.Key.RequestType == "Cancel Order")
                              .SelectMany(rg => rg.Skip(1).Select(rn => rn.n))
                              .OrderByDescending(n => n);

foreach (var n in indicesOfRowsToDelete)
    dt.Rows[n].Delete();

However, since your posted solution creates a new table with the desired rows, here is a LINQ query to create a new DataTable in C#:

var newDT = dt.AsEnumerable()
              .GroupBy(r => new { OrderNumber = r.Field<string>("OrderNumber"), RequestType = r.Field<string>("RequestType") })
              .SelectMany(rg => rg.Key.RequestType == "Cancel Order"
                                    ? rg.Take(1) // or other selection process
                                    : rg
              )
              .CopyToDataTable();

Much appreciated, thank you. It almost seems easier to understand just using If and Else statements. But I will mark this as the answer as opposed to my answer because I did ask for what the Linq was. If you have the C# Linq and could post it, I would like to take a look at it but if not, that's okay. Thank you for your help! — Mark, Dec 05 '19 at 21:37
I think Mark S means only query but I see how it can be confusing with the word remove. — Deleted, Dec 05 '19 at 21:37
@MarkS I added the C# version, and the C# for creating a new `DataTable` instead of deleting rows. — NetMage, Dec 06 '19 at 01:15

Mark · Answer 3 · 2019-12-06T00:45:02.740

What I did to solve this problem was a function that passes in a Data Table and outputs a Data Table with the duplicates I want removed.

I removed the duplicates using a For Each loop and if statements. I still believe there should be a way to do this with Linq. It would be greatly appreciated if you post an answer but for now, I will post mine below.

Note the DataTables are in and out arguments so they don't get declared in my workflow.

Dim ListOfOrderNumbers As New List(Of String)

ForEach row in DataTable1

    If row.Item("RequestType").ToString = "Cancel Order" Then
        If ListOfOrderNumbers.Contains(row.Item("OrderNumber").ToString) Then
            'Do nothing
        Else
            DataTable2.Rows.Add(row.Item("ReportDateTime"), row.Item("OrderNumber").ToString, row.Item("RequestType").ToString, row.Item("OldValue").ToString, row.Item("NewValue").ToString)
            'Add the row to DataTabe2 since we know the order number is not in it yet.
            ListOfOrderNumbers.Add(row.Item("OrderNumber").ToString)
            'Add the OrderNumber to ListOfOrderNumbers so a row with the same OrderNumber doesn't get added to DataTable2 again.
    Else
        DataTable2.Rows.Add(ReportDateTime, OrderNumber, RequestType, OldValue, NewValue)

Instead of `List(Of String)` you should use `HashSet(Of String)` since the main purpose is to test for `Contains`. I also suggest `row.Field(Of String)("OrderNumber")` is preferable to `row.Item("OrderNumber").ToString` which simplifies to `row("OrderNumber").ToString` as well. — NetMage, Dec 06 '19 at 01:18
Thank you for the suggestion. I'll read up on Hash Set. I've never heard of that variable type. — Mark, Dec 06 '19 at 04:19

What's The Best Way To Remove Duplicate Rows Based On A Row Item In A DataTable With VB.Net?

3 Answers3