Home » excel » excel – Sort alphanumeric data where there are hyphens

# excel – Sort alphanumeric data where there are hyphens

Questions:

I have two sheets in a workbook, each with its own column of e-mail addresses, among other data. I will reference Column1 in Sheet1 and Column2 in Sheet2, where only Column1 may have duplicate e-mail addresses listed.

I need to identify whether the e-mail addresses in Column1 are found within Column2, and each time this is true certain code must be run.

I had solved this with two nested Do While loops, where the external loop went through each cell in Column1, named Cell1, from top to bottom and the internal loop compared Cell1 with each cell in Column2, named Cell2, also from top to bottom, exiting the internal loop early if identical values were found.

To make this more efficient, I thought to sort each column in ascending order, and have each Cell1 only look through Column2 until the value of the string in Cell2 is larger than the value of the string in Cell1, and when the next Cell1 is iterated it will continue from Cell2 where the last loop stopped, since earlier Cell2 values are all smaller than Cell1 and cannot have an equal value.

The code I’ve come up with is an external loop going through each cell in Column1, and an internal loop as below:

``````'x1 is the row number of Cell1
'x2 is the row number of Cell2
'below is the code for the internal loop looking through Column2

Do While Sheets(2).Cells(x2, 1).Value <> 0
If LCase(Sheets(1).Cells(x1, 1).Value) < LCase(Sheets(2).Cells(x2, 1).Value) Then
Exit Do
ElseIf LCase(Sheets(1).Cells(x1, 1).Value) = LCase(Sheets(2).Cells(x2, 1).Value) Then

'... code is run

Exit Do
End If
x2 = x2 + 1
Loop
``````

The issue is e-mail addresses can have hyphens (-) and apostrophes (‘). While Excel ignores them when sorting columns, VBA doesn’t ignore them when comparing alphanumeric values.

If I have:

``````     A           B
1  Noemi      Noemi
3  notify     notify
``````

The code will compare A1 with B1 and see `A1=B1`, then A2 with B1 and see that `A2<B1` and then skip to A3.

My first question is, can I force Excel to sort alphanumeric text including hyphens and apostrophes?

If not, so far I’ve only thought of a workaround by looking at Cell1 and Cell2 whether they have – or ‘ in them, if TRUE for any of them then using new variables to extract the text from Cell1 and Cell2 without the hyphens and apostrophes, and continuing to use these new values in the internal loop.

My second question is, how can I work around this in a more efficient way?

EDIT:

Microsoft recognizes that Excel ignores dashes and apostrophes when sorting:

If I have been asked yesterday, I would have agreed with David’s view on the expected result of an Excel sort. However, after experimentation, I am forced to agree with Dirk. This is important to note:

Apostrophes (‘) and hyphens (-) are ignored, with one exception: If two text strings are the same except for a hyphen, the text with the hyphen is sorted last. source Column A contains the unsorted values I used for testing Dirk’s claim.

Column B has been subjected to a regular Excel sort. As you can see, the column is not in ASCII/Unicode sequence because “single quote” should come before “hyphen” should come before “letter a”.

Excel uses tilde (~) as an escape character for Find so I wondered if it would do the same for Sort. `AdjustedSort1` replaces “single quote” by “tilde single quote” and “hyphen” by “tilde hyphen”, sorts and then restores “single quote” and “hyphen”. The result is shown in column C. The sequence is better but not ASCII/Unicode because “aa-b” comes before “aa’c”.

Column D uses a VBA Shell Sort routine I wrote years ago. You would probably be better to search the web for “VBA Quick Sort” if your lists are very large but my sort should give acceptable performance for reasonably sized lists.

``````Sub AdjustedSort1()

With Worksheets("Sheet2").Columns("C")

.Replace What:="'", Replacement:="~'", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, _
SearchFormat:=False, ReplaceFormat:=False
.Replace What:="-", Replacement:="~-", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, _
SearchFormat:=False, ReplaceFormat:=False
OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom, DataOption1:=xlSortNormal
.Replace What:="~~-", Replacement:="-", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, _
SearchFormat:=False, ReplaceFormat:=False
.Replace What:="~~'", Replacement:="'", LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, _
SearchFormat:=False, ReplaceFormat:=False

End With

End Sub

Dim Inx As Long
Dim RngValue As Variant
Dim RowLast As Long
Dim ColValue() As String

With Worksheets("Sheet2")

RowLast = .Cells(Rows.Count, "D").End(xlUp).Row

RngValue = .Range(.Cells(2, "D"), .Cells(RowLast, "D")).Value

' Copy values from 2D array to 1D array
ReDim ColValue(LBound(RngValue, 1) To UBound(RngValue, 1))
For Inx = LBound(RngValue, 1) To UBound(RngValue, 1)
ColValue(Inx) = RngValue(Inx, 1)
Next

' Sort array
Call ShellSort(ColValue, UBound(ColValue))

' Copy values back to 2D array
For Inx = LBound(ColValue) To UBound(ColValue)
RngValue(Inx, 1) = ColValue(Inx)
Next

' Copy values back to column D
.Range(.Cells(2, "D"), .Cells(RowLast, "D")).Value = RngValue

End With

End Sub
Public Sub ShellSort(arrstgTgt() As String, inxLastToSort As Integer)

' Coded 2 March 07
' Algorithm and text from Algorithms (Second edition) by Robert Sedgewick

'   The most basic sort is the insertion sort in which adjacent elements are compared
' and swapped as necessary.  This can be very slow if the smallest elements are at
' end.  ShellSort is a simple extension which gains speed by allowing exchange of
' elements that are far apart.
'   The idea is to rearrange the file to give it the property that taking every h-th
' element (starting anywhere) yields a sorted file.  Such a file is said to be
' h-sorted.  Put another way, an h-sorted file is h independent sorted files,
' interleaved together.  By h-sorting for large value of H, we can move elements
' in the array long distances and thus make it easier to h-sort for smaller values of
' h.  Using such a procedure for any sequence of values of h which ends in 1 will
' produce a sorted file.
'   This program uses the increment sequence: ..., 1093, 364, 121, 40, 13, 4, 1.  This
' is known to be a good sequence but cannot be proved to be the best.
'   The code looks faulty but it is not.  The inner loop compares an
' entry with the previous in the sequence and if necessary moves it back down the
' sequence to its correct position.  It does not continue with the rest of the sequence
' giving the impression it only partially sorts a sequence.  However, the code is not
' sorting one sequence then the next and so on.  It examines the entries in element
' number order.  Having compared an entry against the previous in its sequence, it will
' be intH loops before the next entry in the sequence in compared against it.

Dim intNumRowsToSort          As Integer
Dim intH                      As Integer
Dim inxRowA                   As Integer
Dim inxRowB                   As Integer
Dim inxRowC                   As Integer
Dim stgTemp                   As String

intNumRowsToSort = inxLastToSort - LBound(arrstgTgt) + 1

' Set intH to 1, 4, 13, 40, 121, ..., 3n+1, ... until intH > intNumRowsToSort
intH = 1
Do While intH <= intNumRowsToSort
intH = 3 * intH + 1
Loop

Do While True
If intH = 1 Then Exit Do
' The minimum value on entry to this do-loop will be 4 so there is at least
' one repeat of the loop.
intH = intH \ 3
For inxRowA = intH + 1 To intNumRowsToSort
inxRowB = inxRowA
Do While True
' The value of element inxRowA has been saved.  Now move the element intH back
' from row inxRowA into this row if it is smaller than the saved value.  Repeat
' this for earlier elements until one is found that is larger than the saved
' value which is placed in the gap.
inxRowC = inxRowB - intH
If arrstgTgt(inxRowC + intLBoundAdjust) <= stgTemp Then Exit Do
inxRowB = inxRowC
If inxRowB <= intH Then Exit Do
Loop