I have a spreadsheet containing a list of numbers between 0 and 90000 stored as strings in Column R.
Each number, allocated by another system, is supposed to be unique. About 5% are used one or more times previously. I have no control over the other system.
Each month I add about 50 numbers to this column. I need to identify whether any of the new numbers already exist in the list (including in the new ones added), and identify the row in the spreadsheet that contains the first duplicate and then each subsequent duplicate.
Ultimately, I need to identify (for example): Row 51 is the first containing the string “000356”, and this also appears in rows 357 and 745.
Doing the search (in VBA) row by row is very time consuming (I currently have over 1000 rows). I will need to do a similar search on a column with over 3000 rows.
My research indicates that using VBA dictionaries would be a faster way of doing this identification of duplicates.
In my small test procedure below I have not been able to get it to work, more specifically I need to identify which row in the spreadsheet the duplicate numbers exist.
Is there a better way of achieving this, and what can be done to amend my test code below?
' From Module M2A to test faster search methods ' Needs "Microsoft Scripting Runtime" enabled Dim shtCFYsheet As Worksheet Dim oFound As Boolean Dim junk, actName As String Dim lastrowCFYsheet As Long Dim dictA As New Scripting.dictionary Dim keyA, keyB As Variant Set shtCFYsheet = Worksheets("Communify Sheet") lastrowCFYsheet = shtCFYsheet.Cells(Rows.Count, "A").End(xlUp).Row ' Load up DictA with all the entries from Column R For i = 2 To lastrowCFYsheet 'Row 1 contains headings dictA(Trim(shtCFYsheet.Cells(i, "R").Value)) = 1 Next i For Each keyA In dictA.Keys junk = DoEvents() oFound = False 'reset the flag for the next KeyA entry EntryA = keyA ' Capture the DictA entry 'Search for the first DictA entry throughout the DictA dictionary For Each keyB In dictA.Keys EntryB = keyB ' Capture the DictB entry 'Test for a match If Trim(EntryA) = Trim(EntryB) Then If oFound = True Then Debug.Print "Match:" & EntryA, EntryB, "A-row " & _ dictA.Item(keyA), "B-row " & dictA.Item(keyB) 'Ignore first match as that's my own entry oFound = True 'Now set flag so that next entry gets flagged as a duplicate End If Next keyB Next keyA
Sample data with two duplicates:
2456 4863 4190 2123 5610 9061 2640 679 4702 7428 38 3082 4702 8391 8781 998 2091 3729 5610 5051 1796 3355 169 1788 8838
Option Explicit Sub dupeRs() Dim i As Long, arr As Variant, tmp As Variant Dim dict As New Scripting.Dictionary With Worksheets("Communify Sheet") 'load worksheet values into array arr = .Range(.Cells(1, "R"), .Cells(Rows.Count, "R").End(xlUp)).Value End With 'build dictionary For i = 2 To UBound(arr, 1) If dict.exists(arr(i, 1)) Then tmp = dict.Item(arr(i, 1)) ReDim Preserve tmp(LBound(tmp) To UBound(tmp) + 1) tmp(UBound(tmp)) = i dict.Item(arr(i, 1)) = tmp Else dict.Item(arr(i, 1)) = Array(i) End If Next i 'optionally remove all non-duplicates For Each tmp In dict.Keys If UBound(dict.Item(tmp)) = 0 Then dict.Remove tmp Next tmp 'debug.print the duplicates and row numbers For Each tmp In dict.Keys Debug.Print tmp & " in rows " & Join(dict.Item(tmp), ", ") Next tmp End Sub
005610 in rows 6, 20 004702 in rows 10, 14
You could modify the below and try:
Option Explicit Sub test() Dim LastrowS1 As Long, LastrowS2 As Long, Times As Long, i As Long Dim rng As Range, rngFound As Range Dim str As String 'Find the last row of column A LastrowS1 = Sheet1.Cells(Sheet1.Rows.Count, "A").End(xlUp).Row 'Start Loop from the lastrow to row 1 upside down For i = 2 To LastrowS1 'Give value to str str = Sheet1.Range("A" & i).Value 'Find the last row of column A LastrowS2 = Sheet2.Cells(Sheet2.Rows.Count, "A").End(xlUp).Row 'Set the range you want to search Set rng = Sheet2.Range("A2:A" & LastrowS2) 'Count how many times str appears in rng Times = Application.WorksheetFunction.CountIf(rng, str) 'If it is appears more that one time If Times > 0 Then Set rngFound = rng.Find(str) Sheet2.Cells(rngFound.Row, 2).Value = Sheet2.Cells(rngFound.Row, 2).Value & ", " & "Row" & " " & i Else Sheet2.Range("A" & LastrowS2 + 1).Value = str Sheet2.Range("B" & LastrowS2 + 1).Value = "Row" & " " & i End If Next i End Sub
The code below adds a column to the right of the worksheet and writes the row numbers into it. It then sorts on the number strings in column R, thereby joining duplicates in consecutive rows. The thus altered number column is then checked for consecutive duplicates, recording their row numbers (created in step 1) in another column added on the right. Finally, the data are sorted on their row numbers, restoring the original sequence and the column with the row numbers is deleted. The column with the marked duplicates remains on the right. Only the first occurrence has a list of all the row numbers where duplicates exist.
To test, run the procedure
FindDuplicates. Observe that the two enumerations at the top of the code may have to be reset.
NwsFirstDataRow was 3 (the third row from the top) in my trials. Your data might start in row 2. The column containing the numbers is set to 18 (column R, counting from A=1). You can specify any other column.
Option Explicit Enum Nws ' Worksheet navigation ' 04 Mar 2019 NwsFirstDataRow = 3 ' assuming 2 caption rows above the data NwsNumber = 18 ' column R would be 18 End Enum Sub FindDuplicates() ' 04 Mar 2019 Dim Ws As Worksheet Dim Rng As Range Dim FreeClm As Long Dim R As Long ' modify workbook definition and worksheet name as appropriate Set Ws = ActiveWorkbook.Worksheets("Duplicates") With Ws Set Rng = .Range(.Cells(NwsFirstDataRow, NwsNumber), _ .Cells(.Rows.Count, NwsNumber).End(xlUp)) With .UsedRange FreeClm = .Columns.Count + .Column End With Application.ScreenUpdating = False WriteRowNumbers Rng, FreeClm SortNumbers Ws, Rng, FreeClm MarkDuplicates Ws, Rng, FreeClm SortNumbers Ws, Rng.Offset(0, FreeClm - NwsNumber), FreeClm + 1 .Columns(FreeClm).Delete Application.ScreenUpdating = True End With End Sub Private Sub WriteRowNumbers(Rng As Range, _ C As Long) ' 04 Mar 2019 Dim Arr As Variant Dim R As Long ReDim Arr(1 To Rng.Rows.Count) For R = 1 To UBound(Arr) Arr(R) = Rng.Cells(R).Row Next R Rng.Offset(0, C - NwsNumber).Value = Application.Transpose(Arr) End Sub Private Sub SortNumbers(Ws As Worksheet, _ Rng As Range, _ C As Long) ' 04 Mar 2019 Dim SortRng As Range With Ws Set SortRng = .Range(.Cells(NwsFirstDataRow, 1), _ .Cells(NwsFirstDataRow + Rng.Rows.Count - 1, C)) End With With Ws.Sort.SortFields .Clear .Add Key:=Rng, _ SortOn:=xlSortOnValues, _ Order:=xlAscending, _ DataOption:=xlSortTextAsNumbers End With With Ws.Sort .SetRange SortRng .Header = False .MatchCase = False .Orientation = xlTopToBottom .SortMethod = xlPinYin .Apply End With End Sub Private Sub MarkDuplicates(Ws As Worksheet, _ Rng As Range, _ C As Long) ' 04 Mar 2019 Dim Spike As String Dim Arr As Variant Dim PrevNum As String, Rt As Long Dim R As Long Arr = Rng.Value For R = 1 To UBound(Arr) If Arr(R, 1) = PrevNum Then Spike = Spike & ", " & Ws.Cells(R + NwsFirstDataRow - 1, C).Value Else If InStr(Spike, ",") Then Ws.Cells(Rt, C + 1).Value = Spike Rt = R + NwsFirstDataRow - 1 Spike = Ws.Cells(Rt, C).Value End If PrevNum = Arr(R, 1) Next R End Sub