Home » excel » excel – Why does my code to Scrape Text using VBA works in Debug only

excel – Why does my code to Scrape Text using VBA works in Debug only

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have written some code to scrape specific dates from Google’s patent website. After reviewing lots of examples I figured out the getElementsByClassName that gets the date I need. The code below works when I step through in debug mode and generates the desired MsgBox. But when I run it, it gives me “Run-time error ’91’: Object variable or With block variable not set.”

I have added delays wherever I thought that might be an issue. I have also disassociated the code from any interaction with the Excel spreadsheet where I would ultimately put the date, just to make it as simple as possible. I’ve also copied the code from the original spreadsheet to a new blank one, but same issue.

Any help would be appreciated.

Sub Get_Date()
Dim ie As InternetExplorer
Dim sURL As String
Dim strGrant As Variant

    Set ie = New InternetExplorer
sURL = "https://patents.google.com/patent/US6816842B1/en?oq=6816842"

ie.navigate sURL
ie.Visible = False

Do While ie.Busy Or ie.ReadyState < 4
    DoEvents
Loop

strGrant = ie.document.getElementsByClassName("granted style-scope application-timeline")(0).innerText

Do While ie.Busy Or ie.ReadyState < 4
    DoEvents
Loop

MsgBox strGrant

ie.Quit

End Sub
    ````
How to&Answers:

It’s likely a timing issue as per my comment. That’s dealt with in other answers to similar questions. Main things to consider are:

  1. Use proper page load waits: While IE.Busy Or ie.readyState < 4: DoEvents: Wend
  2. Possibly a timed loop to attempt to set the element to a variable then testing if set.

Alternatively, a bit of a punt but it seems that all granted dates are the same as publication dates (patent publication date). If this is true then you can use xhr to get the publication date

Option Explicit   
Public Sub GetDates()
    Dim html As HTMLDocument, i As Long, patents()
    patents = Array("US7724240", "US6876312", "US8259073", "US7523862", "US6816842B1")
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        For i = LBound(patents) To UBound(patents)
            .Open "GET", "https://patents.google.com/patent/" & patents(i) & "/en?oq=" & patents(i), False
            .setRequestHeader "User-Agent", "Mozilla/5.0"
            .send
            html.body.innerHTML = .responseText
            If html.querySelectorAll("[itemprop=publicationDate]").length > 0 Then
                Debug.Print html.querySelector("[itemprop=publicationDate]").DateTime
            End If
        Next
    End With
End Sub