Home » excel » excel – Fnd rows of duplicate entries in a column

excel – Fnd rows of duplicate entries in a column

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have a spreadsheet containing a list of numbers between 0 and 90000 stored as strings in Column R.

Each number, allocated by another system, is supposed to be unique. About 5% are used one or more times previously. I have no control over the other system.

Each month I add about 50 numbers to this column. I need to identify whether any of the new numbers already exist in the list (including in the new ones added), and identify the row in the spreadsheet that contains the first duplicate and then each subsequent duplicate.

Ultimately, I need to identify (for example): Row 51 is the first containing the string “000356”, and this also appears in rows 357 and 745.

Doing the search (in VBA) row by row is very time consuming (I currently have over 1000 rows). I will need to do a similar search on a column with over 3000 rows.

My research indicates that using VBA dictionaries would be a faster way of doing this identification of duplicates.

In my small test procedure below I have not been able to get it to work, more specifically I need to identify which row in the spreadsheet the duplicate numbers exist.

Is there a better way of achieving this, and what can be done to amend my test code below?

'   From Module M2A to test faster search methods
'   Needs "Microsoft Scripting Runtime" enabled

Dim shtCFYsheet As Worksheet
Dim oFound As Boolean
Dim junk, actName As String
Dim lastrowCFYsheet As Long
Dim dictA As New Scripting.dictionary
Dim keyA, keyB As Variant

Set shtCFYsheet = Worksheets("Communify Sheet")
lastrowCFYsheet = shtCFYsheet.Cells(Rows.Count, "A").End(xlUp).Row

'   Load up DictA with all the entries from Column R

For i = 2 To lastrowCFYsheet 'Row 1 contains headings
    dictA(Trim(shtCFYsheet.Cells(i, "R").Value)) = 1
Next i

For Each keyA In dictA.Keys
    junk = DoEvents()
    oFound = False 'reset the flag for the next KeyA entry

    EntryA = keyA ' Capture the DictA entry

    'Search for the first DictA entry throughout the DictA dictionary
    For Each keyB In dictA.Keys
        EntryB = keyB ' Capture the DictB entry
        'Test for a match
        If Trim(EntryA) = Trim(EntryB) Then
            If oFound = True Then Debug.Print "Match:" & EntryA, EntryB, "A-row " & _
              dictA.Item(keyA), "B-row " & dictA.Item(keyB)
            'Ignore first match as that's my own entry
            oFound = True 'Now set flag so that next entry gets flagged as a duplicate
        End If
    Next keyB
Next keyA

End Sub

Sample data with two duplicates:

2456
4863
4190
2123
5610
9061
2640
679
4702
7428
38
3082
4702
8391
8781
998
2091
3729
5610
5051
1796
3355
169
1788
8838
How to&Answers:

Code:

Option Explicit

Sub dupeRs()

    Dim i As Long, arr As Variant, tmp As Variant
    Dim dict As New Scripting.Dictionary

    With Worksheets("Communify Sheet")

        'load worksheet values into array
        arr = .Range(.Cells(1, "R"), .Cells(Rows.Count, "R").End(xlUp)).Value

    End With

    'build dictionary
    For i = 2 To UBound(arr, 1)
        If dict.exists(arr(i, 1)) Then
            tmp = dict.Item(arr(i, 1))
            ReDim Preserve tmp(LBound(tmp) To UBound(tmp) + 1)
            tmp(UBound(tmp)) = i
            dict.Item(arr(i, 1)) = tmp
        Else
            dict.Item(arr(i, 1)) = Array(i)
        End If
    Next i

    'optionally remove all non-duplicates
    For Each tmp In dict.Keys
        If UBound(dict.Item(tmp)) = 0 Then dict.Remove tmp
    Next tmp

    'debug.print the duplicates and row numbers
    For Each tmp In dict.Keys
        Debug.Print tmp & " in rows " & Join(dict.Item(tmp), ", ")
    Next tmp

End Sub

Results:

005610 in rows 6, 20
004702 in rows 10, 14

Answer:

You could modify the below and try:

Option Explicit

Sub test()

    Dim LastrowS1 As Long, LastrowS2 As Long, Times As Long, i As Long
    Dim rng As Range, rngFound As Range
    Dim str As String

    'Find the last row of column A
    LastrowS1 = Sheet1.Cells(Sheet1.Rows.Count, "A").End(xlUp).Row

    'Start Loop from the lastrow to row 1 upside down
    For i = 2 To LastrowS1
        'Give value to str
        str = Sheet1.Range("A" & i).Value
        'Find the last row of column A
        LastrowS2 = Sheet2.Cells(Sheet2.Rows.Count, "A").End(xlUp).Row
        'Set the range you want to search
        Set rng = Sheet2.Range("A2:A" & LastrowS2)
        'Count how many times str appears in rng
        Times = Application.WorksheetFunction.CountIf(rng, str)
        'If it is appears more that one time
        If Times > 0 Then
            Set rngFound = rng.Find(str)
            Sheet2.Cells(rngFound.Row, 2).Value = Sheet2.Cells(rngFound.Row, 2).Value & ", " & "Row" & " " & i
        Else
             Sheet2.Range("A" & LastrowS2 + 1).Value = str
             Sheet2.Range("B" & LastrowS2 + 1).Value = "Row" & " " & i
        End If

    Next i

End Sub

Sheet 1:

enter image description here
Sheet 2:

enter image description here

Answer:

The code below adds a column to the right of the worksheet and writes the row numbers into it. It then sorts on the number strings in column R, thereby joining duplicates in consecutive rows. The thus altered number column is then checked for consecutive duplicates, recording their row numbers (created in step 1) in another column added on the right. Finally, the data are sorted on their row numbers, restoring the original sequence and the column with the row numbers is deleted. The column with the marked duplicates remains on the right. Only the first occurrence has a list of all the row numbers where duplicates exist.

To test, run the procedure FindDuplicates. Observe that the two enumerations at the top of the code may have to be reset. NwsFirstDataRow was 3 (the third row from the top) in my trials. Your data might start in row 2. The column containing the numbers is set to 18 (column R, counting from A=1). You can specify any other column.

Option Explicit

Enum Nws                                ' Worksheet navigation
    ' 04 Mar 2019
    NwsFirstDataRow = 3                 ' assuming 2 caption rows above the data
    NwsNumber = 18                      ' column R would be 18
End Enum

Sub FindDuplicates()
    ' 04 Mar 2019

    Dim Ws As Worksheet
    Dim Rng As Range
    Dim FreeClm As Long
    Dim R As Long

    ' modify workbook definition and worksheet name as appropriate
    Set Ws = ActiveWorkbook.Worksheets("Duplicates")
    With Ws
        Set Rng = .Range(.Cells(NwsFirstDataRow, NwsNumber), _
                         .Cells(.Rows.Count, NwsNumber).End(xlUp))
        With .UsedRange
            FreeClm = .Columns.Count + .Column
        End With
        Application.ScreenUpdating = False
        WriteRowNumbers Rng, FreeClm
        SortNumbers Ws, Rng, FreeClm
        MarkDuplicates Ws, Rng, FreeClm
        SortNumbers Ws, Rng.Offset(0, FreeClm - NwsNumber), FreeClm + 1
        .Columns(FreeClm).Delete
        Application.ScreenUpdating = True
    End With
End Sub

Private Sub WriteRowNumbers(Rng As Range, _
                            C As Long)
    ' 04 Mar 2019

    Dim Arr As Variant
    Dim R As Long

    ReDim Arr(1 To Rng.Rows.Count)
    For R = 1 To UBound(Arr)
        Arr(R) = Rng.Cells(R).Row
    Next R
    Rng.Offset(0, C - NwsNumber).Value = Application.Transpose(Arr)
End Sub

Private Sub SortNumbers(Ws As Worksheet, _
                        Rng As Range, _
                        C As Long)
    ' 04 Mar 2019

    Dim SortRng As Range

    With Ws
        Set SortRng = .Range(.Cells(NwsFirstDataRow, 1), _
                             .Cells(NwsFirstDataRow + Rng.Rows.Count - 1, C))
    End With
    With Ws.Sort.SortFields
        .Clear
        .Add Key:=Rng, _
             SortOn:=xlSortOnValues, _
             Order:=xlAscending, _
             DataOption:=xlSortTextAsNumbers
    End With
    With Ws.Sort
        .SetRange SortRng
        .Header = False
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With
End Sub

Private Sub MarkDuplicates(Ws As Worksheet, _
                           Rng As Range, _
                           C As Long)
    ' 04 Mar 2019

    Dim Spike As String
    Dim Arr As Variant
    Dim PrevNum As String, Rt As Long
    Dim R As Long

    Arr = Rng.Value
    For R = 1 To UBound(Arr)
        If Arr(R, 1) = PrevNum Then
            Spike = Spike & ", " & Ws.Cells(R + NwsFirstDataRow - 1, C).Value
        Else
            If InStr(Spike, ",") Then Ws.Cells(Rt, C + 1).Value = Spike
            Rt = R + NwsFirstDataRow - 1
            Spike = Ws.Cells(Rt, C).Value
        End If
        PrevNum = Arr(R, 1)
    Next R
End Sub