Home » excel » vba – How to breakdown text with a non-uniform delimiter?

vba – How to breakdown text with a non-uniform delimiter?

Posted by: admin May 14, 2020 Leave a comment


I have this data in Excel:

enter image description here

But one of my clients needs it summarize per item in detail.
So above data needs to be converted to:

enter image description here

This way, client can analyze it per tracking and per item.
The text format is not really uniform since it is entered manually.
Some users use Alt+Enter to separate items. Some uses space and some doesn’t bother separating at all. What’s consistent though is that they put hyphen(-) after the item then the count (although not always followed by the number, there can be spaces in between). Also if the count of that item is one(1), they don’t bother putting it at all (as seen on the tracking IDU3004 for Apple Juice).

The only function I tried is the Split function which brings me closer to what I want.
But I am still having a hard time separating the individual array elements into what I expect.
So for example, IDU3001 in above after using Split (with “-” as delimiter) will be:

arr(0) = "Apple"
arr(1) = "20 Grape"
arr(2) = "5" & Chr(10) & "Pear" ~~> Just to show Alt+Enter
arr(3) = "3Banana"
arr(4) = "2"

Of course I can come up with a function to deal with each of the elements to extract numbers and items.
Actually I was thinking of using just that function and skip the Split altogether.
I was just curious that maybe there is another way out there since I am not well versed in Text manipulation.
I would appreciate any idea that would point me to a possible better solution.

How to&Answers:

I suggest using a Regular Expression approach

Here’s a demo based on your sample data.

Sub Demo()
    Dim re As RegExp
    Dim rMC As MatchCollection
    Dim rM As Match
    Dim rng As Range
    Dim rw As Range
    Dim Detail As String

    ' replace with the usual logic to get the range of interest
    Set rng = [A2:C2]

    Set re = New RegExp

    re.Global = True
    re.IgnoreCase = True
    re.Pattern = "([a-z ]+[a-z])\s*\-\s*(\d+)\s*"
    For Each rw In rng.Rows
        ' remove line breaks and leading/trailing spaces
        Detail = Trim$(Replace(rw.Cells(1, 3).Value, Chr(10), vbNullString))

        If Not Detail Like "*#" Then
            ' Last item has no - #, so add -1
            Detail = Detail & "-1"
        End If

        ' Break up string
        If re.Test(Detail) Then
            Set rMC = re.Execute(Detail)
            For Each rM In rMC
                ' output Items and Qty's to Immediate window
                Debug.Print rM.SubMatches(0), rM.SubMatches(1)
        End If
End Sub

Based on your comment I haved assumed that only the last item in a cell may be missing a -#

Sample input

Apple Juice- 20 Grape -5
pear- 3Banana-2Orange

Produces this output

Apple Juice   20
Grape         5
pear          3
Banana        2
Orange        1