Home » excel » regex – Excel VBA Regular Expressions Working with Quotes

regex – Excel VBA Regular Expressions Working with Quotes

Posted by: admin April 23, 2020 Leave a comment

Questions:

I need to declare a string for use as a Regular Expression pattern.

The string is:
(?<=”[a-zA-Z0-9.-]*\d{8}.xml(?=”)

Usually to declare a string in VBA for use in Reg Exp you enclose in double quotes so it would look like this:
“(?<=”[a-zA-Z0-9.-]*\d{8}.xml(?=”)”
but that results in a VBA Compile Error: Expected: end of statement with the [a-zA-Z0-9.-] highlighted.

This:
“(?<=”””[a-zA-Z0-9.-]*\d{8}.xml(?=”””)”
results in the same error.

This
“(?<=””””[a-zA-Z0-9.-]*\d{8}.xml(?=””””)”

works but when I use Msgbox to view the pattern it appears like this:

(?<=””[a-zA-Z0-9.-]*\d{8}.xml(?=””)

and therefore won’t work correctly in RegEx.

Arghhhh!

Here’s the code I’m using for testing:

    Sub tester()
        Dim PATH_TO_FILINGS As String
        'PATH_TO_FILINGS = "www.sec.gov/Archives/edgar/data/1084869/000110465913082760"
        PATH_TO_FILINGS = "www.sec.gov/Archives/edgar/data/1446896/000144689612000023"
        MsgBox GetInstanceDocumentPath(PATH_TO_FILINGS)
    End Sub

    Function GetInstanceDocumentPath(PATH_TO_FILINGS As String)

        'this part launches IE and goes to the correct directory
        If IEbrowser Is Nothing Then
            Set IEbrowser = CreateObject("InternetExplorer.application")
            IEbrowser.Visible = False
        End If

        IEbrowser.Navigate URL:=PATH_TO_FILINGS

        While IEbrowser.Busy Or IEbrowser.readyState <> 4: DoEvents: Wend

       'this part starts the regular expression engine and searches for the reg exp pattern (i.e. the file name)
        Dim RE As Object
        Set RE = CreateObject("vbscript.regexp")

        RE.Pattern = "(?<="[a-zA-Z0-9.-]*\d{8}.xml(?=")"   '"\w+(?=-)(-)\d{8}(.xml)"
        MsgBox RE.Pattern
        RE.IgnoreCase = True

        Dim INSTANCEDOCUMENT As Object

        Set INSTANCEDOCUMENT = RE.Execute(IEbrowser.Document.body.innerhtml)

        If INSTANCEDOCUMENT.Count = 1 Then

            GetInstanceDocumentPath = PATH_TO_FILINGS & "/" & INSTANCEDOCUMENT.Item(0)

        End If

    End Function

Any thoughts on how to approach this are appreciated.

How to&Answers:

Try doing it like this:

Sub Test()
RealQ = Chr(34)
Pattern = "(?<=" & RealQ & ")[a-zA-Z0-9.-]*\d{8}.xml(?=" & RealQ & ")"
MsgBox Pattern
End Sub

Result:

enter image description here

Also, VBA doesn’t support lookbehind but it does support lookahead. A better reference can be found here.