I have two sheets in a workbook, each with its own column of e-mail addresses, among other data. I will reference Column1 in Sheet1 and Column2 in Sheet2, where only Column1 may have duplicate e-mail addresses listed.
I need to identify whether the e-mail addresses in Column1 are found within Column2, and each time this is true certain code must be run.
I had solved this with two nested Do While loops, where the external loop went through each cell in Column1, named Cell1, from top to bottom and the internal loop compared Cell1 with each cell in Column2, named Cell2, also from top to bottom, exiting the internal loop early if identical values were found.
To make this more efficient, I thought to sort each column in ascending order, and have each Cell1 only look through Column2 until the value of the string in Cell2 is larger than the value of the string in Cell1, and when the next Cell1 is iterated it will continue from Cell2 where the last loop stopped, since earlier Cell2 values are all smaller than Cell1 and cannot have an equal value.
The code I’ve come up with is an external loop going through each cell in Column1, and an internal loop as below:
'x1 is the row number of Cell1 'x2 is the row number of Cell2 'below is the code for the internal loop looking through Column2 Do While Sheets(2).Cells(x2, 1).Value <> 0 If LCase(Sheets(1).Cells(x1, 1).Value) < LCase(Sheets(2).Cells(x2, 1).Value) Then Exit Do ElseIf LCase(Sheets(1).Cells(x1, 1).Value) = LCase(Sheets(2).Cells(x2, 1).Value) Then '... code is run Exit Do End If x2 = x2 + 1 Loop
The issue is e-mail addresses can have hyphens (-) and apostrophes (‘). While Excel ignores them when sorting columns, VBA doesn’t ignore them when comparing alphanumeric values.
If I have:
A B 1 Noemi Noemi 2 no-reply no-reply 3 notify notify
The code will compare A1 with B1 and see
A1=B1, then A2 with B1 and see that
A2<B1 and then skip to A3.
My first question is, can I force Excel to sort alphanumeric text including hyphens and apostrophes?
If not, so far I’ve only thought of a workaround by looking at Cell1 and Cell2 whether they have – or ‘ in them, if TRUE for any of them then using new variables to extract the text from Cell1 and Cell2 without the hyphens and apostrophes, and continuing to use these new values in the internal loop.
My second question is, how can I work around this in a more efficient way?
Microsoft recognizes that Excel ignores dashes and apostrophes when sorting:
If I have been asked yesterday, I would have agreed with David’s view on the expected result of an Excel sort. However, after experimentation, I am forced to agree with Dirk. This is important to note:
Apostrophes (‘) and hyphens (-) are ignored, with one exception: If two text strings are the same except for a hyphen, the text with the hyphen is sorted last. source
Column A contains the unsorted values I used for testing Dirk’s claim.
Column B has been subjected to a regular Excel sort. As you can see, the column is not in ASCII/Unicode sequence because “single quote” should come before “hyphen” should come before “letter a”.
Excel uses tilde (~) as an escape character for Find so I wondered if it would do the same for Sort.
AdjustedSort1 replaces “single quote” by “tilde single quote” and “hyphen” by “tilde hyphen”, sorts and then restores “single quote” and “hyphen”. The result is shown in column C. The sequence is better but not ASCII/Unicode because “aa-b” comes before “aa’c”.
Column D uses a VBA Shell Sort routine I wrote years ago. You would probably be better to search the web for “VBA Quick Sort” if your lists are very large but my sort should give acceptable performance for reasonably sized lists.
Sub AdjustedSort1() With Worksheets("Sheet2").Columns("C") .Replace What:="'", Replacement:="~'", LookAt:=xlPart, _ SearchOrder:=xlByRows, MatchCase:=False, _ SearchFormat:=False, ReplaceFormat:=False .Replace What:="-", Replacement:="~-", LookAt:=xlPart, _ SearchOrder:=xlByRows, MatchCase:=False, _ SearchFormat:=False, ReplaceFormat:=False .Sort Key1:=Range("C2"), Order1:=xlAscending, Header:=xlYes, _ OrderCustom:=1, MatchCase:=False, _ Orientation:=xlTopToBottom, DataOption1:=xlSortNormal .Replace What:="~~-", Replacement:="-", LookAt:=xlPart, _ SearchOrder:=xlByRows, MatchCase:=False, _ SearchFormat:=False, ReplaceFormat:=False .Replace What:="~~'", Replacement:="'", LookAt:=xlPart, _ SearchOrder:=xlByRows, MatchCase:=False, _ SearchFormat:=False, ReplaceFormat:=False End With End Sub Sub AdjustedSort2() Dim Inx As Long Dim RngValue As Variant Dim RowLast As Long Dim ColValue() As String With Worksheets("Sheet2") RowLast = .Cells(Rows.Count, "D").End(xlUp).Row ' Load values from column D excluding header RngValue = .Range(.Cells(2, "D"), .Cells(RowLast, "D")).Value ' Copy values from 2D array to 1D array ReDim ColValue(LBound(RngValue, 1) To UBound(RngValue, 1)) For Inx = LBound(RngValue, 1) To UBound(RngValue, 1) ColValue(Inx) = RngValue(Inx, 1) Next ' Sort array Call ShellSort(ColValue, UBound(ColValue)) ' Copy values back to 2D array For Inx = LBound(ColValue) To UBound(ColValue) RngValue(Inx, 1) = ColValue(Inx) Next ' Copy values back to column D .Range(.Cells(2, "D"), .Cells(RowLast, "D")).Value = RngValue End With End Sub Public Sub ShellSort(arrstgTgt() As String, inxLastToSort As Integer) ' Coded 2 March 07 ' Algorithm and text from Algorithms (Second edition) by Robert Sedgewick ' The most basic sort is the insertion sort in which adjacent elements are compared ' and swapped as necessary. This can be very slow if the smallest elements are at ' end. ShellSort is a simple extension which gains speed by allowing exchange of ' elements that are far apart. ' The idea is to rearrange the file to give it the property that taking every h-th ' element (starting anywhere) yields a sorted file. Such a file is said to be ' h-sorted. Put another way, an h-sorted file is h independent sorted files, ' interleaved together. By h-sorting for large value of H, we can move elements ' in the array long distances and thus make it easier to h-sort for smaller values of ' h. Using such a procedure for any sequence of values of h which ends in 1 will ' produce a sorted file. ' This program uses the increment sequence: ..., 1093, 364, 121, 40, 13, 4, 1. This ' is known to be a good sequence but cannot be proved to be the best. ' The code looks faulty but it is not. The inner loop compares an ' entry with the previous in the sequence and if necessary moves it back down the ' sequence to its correct position. It does not continue with the rest of the sequence ' giving the impression it only partially sorts a sequence. However, the code is not ' sorting one sequence then the next and so on. It examines the entries in element ' number order. Having compared an entry against the previous in its sequence, it will ' be intH loops before the next entry in the sequence in compared against it. Dim intNumRowsToSort As Integer Dim intLBoundAdjust As Integer Dim intH As Integer Dim inxRowA As Integer Dim inxRowB As Integer Dim inxRowC As Integer Dim stgTemp As String intNumRowsToSort = inxLastToSort - LBound(arrstgTgt) + 1 intLBoundAdjust = LBound(arrstgTgt) - 1 ' Set intH to 1, 4, 13, 40, 121, ..., 3n+1, ... until intH > intNumRowsToSort intH = 1 Do While intH <= intNumRowsToSort intH = 3 * intH + 1 Loop Do While True If intH = 1 Then Exit Do ' The minimum value on entry to this do-loop will be 4 so there is at least ' one repeat of the loop. intH = intH \ 3 For inxRowA = intH + 1 To intNumRowsToSort stgTemp = arrstgTgt(inxRowA + intLBoundAdjust) inxRowB = inxRowA Do While True ' The value of element inxRowA has been saved. Now move the element intH back ' from row inxRowA into this row if it is smaller than the saved value. Repeat ' this for earlier elements until one is found that is larger than the saved ' value which is placed in the gap. inxRowC = inxRowB - intH If arrstgTgt(inxRowC + intLBoundAdjust) <= stgTemp Then Exit Do arrstgTgt(inxRowB + intLBoundAdjust) = arrstgTgt(inxRowC + intLBoundAdjust) inxRowB = inxRowC If inxRowB <= intH Then Exit Do Loop arrstgTgt(inxRowB + intLBoundAdjust) = stgTemp Next Loop End Sub
Replace all the “-” with a “^” and then Excel will not ignore the “^” in the sort as it does the “-“.
Then can replace the “^” back to “-” if you want to.