1

I would like to check a certain column (W) for duplicates (number of occurrences is stored in another column (AZ)) and than delete all row this way:

  • Value is found two times in the column - delete only one row containing the value.
  • Value is found more times in the column - delete all the rows with the values.

My code works quite well but sometimes it doesn't delete all the duplicates as it should do. Any idea for improvement?

EDIT: The updated code works really good except that it always misses one duplicate and leaves it not deleted.

fin = ws.UsedRange.Rows.count

For i = 2 To fin
    ws.Range("AZ" & i).value = Application.WorksheetFunction.CountIf(ws.Range("W2:W" & fin), ws.Range("W" & i))
Next i

For j = fin To 2 Step -1
    If ws.Range("AZ" & j).value > 2 Then
        ws.Range("AZ" & j).EntireRow.Delete
        fin = ws.UsedRange.Rows.count
    ElseIf ws.Range("AZ" & j).value = 2 Then
        Set rng = Range("W:W").Find(Range("W" & j).value, , xlValues, xlWhole, , xlNext)
        rngRow = rng.Row
        If rngRow <> j Then
            ws.Range("AZ" & rngRow) = "1"
            ws.Range("AZ" & j).EntireRow.Delete
            fin = ws.UsedRange.Rows.count
        Else
            MsgBox "Error at row " & rngRow
        End If
    End If
Next j
Gussmayer
  • 13
  • 3
  • 1
    Can you describe when is your code not successfully remove duplicates? – Lajos Arpad Nov 29 '15 at 16:27
  • 1
    I think the problem is that you are re-doing the Count after you have deleted some rows. That will change some counts, and may cause reclassification of the rows. – Ron Rosenfeld Nov 29 '15 at 19:06
  • The idea behind that is I have to re-do the count after deleting second occurrence of the duplicate so the first occurrence is marked as non duplicate (1) and will not be deleted in the future. At least I was thinking this way. Speed has not been a great issue for me yet. I simply need the double duplicates to reduce to one occurence (A, A to A) - delete one A, and the triple and more duplicates to be deleted completely (B, B, B to -; C, C, C, C, C to -) @RonRosenfeld – Gussmayer Nov 29 '15 at 22:07
  • @Gussmayer, my answer below does exactly that , any rows that have a count greater than 2 on the first loop(AZ) will be deleted, removing all rows except original duplicates, the OR takes care of these by doing a count and removing 1 of those duplicates , its simple tbh , not sure why the newer answers have gone to such complexity for a simple task... – Steven Martin Nov 30 '15 at 00:17
  • @StevenMartin The complexity was added to improve speed. If the database is relatively small, it would not be an issue; if the database requires thousands of row deletions, then the complexity becomes worthwhile. – Ron Rosenfeld Nov 30 '15 at 11:31

3 Answers3

1

If speed is an issue, here is a method that should be faster, as it creates a collection of rows to be deleted, then deletes them. Since everything, except for the actual row deletion, is done in VBA, there are far fewer calls back and forth to the worksheet.

The routine could be sped up as noted in the inline comments. If it is still too slow, depending on the size of the worksheet, it might be feasible to read the entire worksheet into a VBA Array; test for duplicates; write back the results to a new array and write that out to the worksheet. (If your worksheet is too large, this method might run out of memory, though).

In any event, we need both a Class Module which YOU must rename cPhrases, as well as a Regular Module

Class Module

Option Explicit
Private pPhrase As String
Private pCount As Long
Private pRowNums As Collection

Public Property Get Phrase() As String
    Phrase = pPhrase
End Property
Public Property Let Phrase(Value As String)
    pPhrase = Value
End Property

Public Property Get Count() As Long
    Count = pCount
End Property
Public Property Let Count(Value As Long)
    pCount = Value
End Property

Public Property Get RowNums() As Collection
    Set RowNums = pRowNums
End Property
Public Function ADDRowNum(Value As Long)
    pRowNums.Add Value
End Function


Private Sub Class_Initialize()
    Set pRowNums = New Collection
End Sub

Regular Module

Option Explicit
Sub RemoveDuplicateRows()
    Dim wsSrc As Worksheet
    Dim vSrc As Variant
    Dim CP As cPhrases, colP As Collection, colRowNums As Collection
    Dim I As Long, K As Long
    Dim R As Range

'Data worksheet
Set wsSrc = Worksheets("sheet1")

'Read original data into VBA array
With wsSrc
    vSrc = .Range(.Cells(1, "W"), .Cells(.Rows.Count, "W").End(xlUp))
End With

'Collect list of items, counts and row numbers to delete
'Collection object will --> error when trying to add
'  duplicate key.  Use that error to increment the count

Set colP = New Collection
On Error Resume Next
For I = 2 To UBound(vSrc, 1)
        Set CP = New cPhrases
        With CP
            .Phrase = vSrc(I, 1)
            .Count = 1
            .ADDRowNum I

            colP.Add CP, CStr(.Phrase)
            Select Case Err.Number
                Case 457 'duplicate
                    With colP(CStr(.Phrase))
                        .Count = .Count + 1
                        .ADDRowNum I
                    End With
                    Err.Clear
                Case Is <> 0 'some other error.  Stop to debug
                    Debug.Print "Error: " & Err.Number, Err.Description
                    Stop
            End Select
        End With
Next I
On Error GoTo 0

'Rows to be deleted
Set colRowNums = New Collection
For I = 1 To colP.Count
    With colP(I)
        Select Case .Count
            Case 2
                colRowNums.Add .RowNums(2)
            Case Is > 2
                For K = 1 To .RowNums.Count
                    colRowNums.Add .RowNums(K)
                Next K
        End Select
    End With
Next I

'Revers Sort the collection of Row Numbers
'For speed, if necessary, could use
'   faster sort routine
RevCollBubbleSort colRowNums

'Delete Rows
'For speed, could create Unions of up to 30 rows at a time
Application.ScreenUpdating = False
With wsSrc
For I = 1 To colRowNums.Count
   .Rows(colRowNums(I)).Delete
Next I
End With

Application.ScreenUpdating = True

End Sub

'Could use faster sort routine if necessary
Sub RevCollBubbleSort(TempCol As Collection)
    Dim I As Long
    Dim NoExchanges As Boolean

    ' Loop until no more "exchanges" are made.
    Do
        NoExchanges = True

        ' Loop through each element in the array.
        For I = 1 To TempCol.Count - 1

            ' If the element is less than the element
            ' following it, exchange the two elements.
            If TempCol(I) < TempCol(I + 1) Then
                NoExchanges = False
                TempCol.Add TempCol(I), after:=I + 1
                TempCol.Remove I
            End If
        Next I
    Loop While Not (NoExchanges)
End Sub
Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
0

no need to use that inefficient second loop in the second section, just use a live count like so

fin = ws.UsedRange.Rows.count

For i = 2 To fin

    ws.Range("AZ" & i).value = Application.WorksheetFunction.CountIf(ws.Range("W2:W" & fin), ws.Range("W" & i))

Next i

For j = fin To 2 Step -1

    If ws.Range("AZ" & j).value > 2 OR Application.WorksheetFunction.CountIf(ws.Range("W2:W" & fin), ws.Range("W" & j)) = 2 Then

        ws.Range("AZ" & j).EntireRow.Delete

    End If
Next j
Steven Martin
  • 3,150
  • 1
  • 20
  • 27
  • I need one occurrence of double duplicate to remain not deleted (A, A to A). However, all triple and more duplicates should be deleted completely (B, B, B, B to - ). – Gussmayer Nov 29 '15 at 22:03
  • That's exactly what this does! , if the original count(saved in cell) is > 2 or if the live count = 2 , delete, you should try it before writing a comment like that – Steven Martin Nov 30 '15 at 00:07
0

While your logic is basically sound, the method is not the most efficient. The AutoFilter Method can quickly remove all counts greater than 2 and the Range.RemoveDuplicates¹ method cansubsequently make quick work of removing one of the rows that still contain duplicate values in column W.

Dim r As Long, c As Long
With ws
    If .AutoFilterMode Then .AutoFilterMode = False
    r = .Cells.SpecialCells(xlLastCell).Row
    c = Application.Max(52, .Cells.SpecialCells(xlLastCell).Column)
    With .Range("A1", .Cells(r, c))   '.UsedRange
        With .Columns(52)
            If IsEmpty(.Cells(1, 1)) Then .Cells(1, 1) = "count"
            With .Resize(.Rows.Count - 1, 1).Offset(1, 0)
                .Cells.FormulaR1C1 = "=COUNTIF(C[-29], RC[-29])"
                .Cells = .Cells.Value
            End With
            .AutoFilter field:=1, Criteria1:=">2"
            With .Resize(.Rows.Count - 1, 1).Offset(1, 0)
                If CBool(Application.Subtotal(103, .Cells)) Then
                    .SpecialCells(xlCellTypeVisible).EntireRow.Delete
                End If
            End With
            .AutoFilter
        End With
        .RemoveDuplicates Columns:=23, Header:=xlYes
    End With
End With

When you rewrite the count values in column AZ, you are likely going to rewrite 3 counts to 2, etc.


¹ The Range.RemoveDuplicates method removes duplicate rows from the bottom up.

  • I don't see how your code reduces the double duplicates to one occurrence (A, A to A) - delete one A, and the triple and more duplicates deletes completely (B, B, B to -; C, C, C, C, C to -). Can you give me any more hint? – Gussmayer Nov 29 '15 at 22:06
  • @Gussmayer - The rewrite of the COUNTIF count will reduce quads to triples triples to doubles, etc (as noted) so pretty much all duplicates will be rewritten to doubles (as noted in the last paragraph). After rereading the OP's question, I've decided that this is not the preferred condition and that the rewrite may be the source of the problem, not an on-the-fly correction. See my edit above. –  Nov 29 '15 at 22:14