Home » excel » excel – Find which cells have the smallest levenshtein distance

excel – Find which cells have the smallest levenshtein distance

Posted by: admin May 14, 2020 Leave a comment

Questions:

So, I have this Function which will quickly return the Levenshtein Distance between two Strings:

Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long

Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long

string1_length = Len(string1)
string2_length = Len(string2)
ReDim distance(string1_length, string2_length)

For i = 0 To string1_length
    distance(i, 0) = i
Next

For j = 0 To string2_length
    distance(0, j) = j
Next

For i = 1 To string1_length
    For j = 1 To string2_length
        If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
            distance(i, j) = distance(i - 1, j - 1)
        Else
            distance(i, j) = Application.WorksheetFunction.Min _
            (distance(i - 1, j) + 1, _
             distance(i, j - 1) + 1, _
             distance(i - 1, j - 1) + 1)
        End If
    Next
Next

Levenshtein = distance(string1_length, string2_length)

End Function

I want to perform a fast comparison between all cells in the “A” column and return which ones have a “small” Levenshtein distance. How would I make all these comparisons?

How to&Answers:

Do you want to find which combinations of strings have small levenshtein distances or just overall how similar/disimilar each string is with all the other strings?

If it is the former this should work fine:

Example2

You just copy and paste transposed values to create all those headers(as Dale commented). You can use the conditional formatting to highlight the lowest results.

Or if you want the actual strings to return you should be able to use this:

=IF(AND(Levenshtein($A28,B$27)>0,Levenshtein($A28,B$27)<=3),$A28&"/"&B$27,"")

Example3

Just copy and paste unique values if you want the returned combinations in a single column.

Good Luck.