Home » excel » HTML Extraction through VBA/Excel

HTML Extraction through VBA/Excel

Posted by: admin May 14, 2020 Leave a comment


I have the following HTML code on 17track.net:

 <div class="tools"> <button type="button" class="btn btn-sm icon fa-copy-files-o waves- 
effect" data-toggle="tooltip" data-placement="top" data-original-title="Copy detailed 
 tracking results for all numbers." id="cl-details" data-clipboard-text="Number: 
 Package status: Delivered (26 Days)
 Country: China -> United States
 2020-04-02 13:03, CATAWBA, SC 29704, Delivered, In/At Mailbox -> Your item was delivered in 
 or at the mailbox at 1:03 pm on April 2, 2020 in CATAWBA, SC 29704.
 2020-04-02 08:13, CATAWBA, SC 29704, Out for Delivery
 2020-04-02 08:02, CATAWBA, SC 29704, Arrived at Unit
 Destination Facility
 2020-03-31 00:00, In Transit to Next Facility
 2020-03-30 10:02, ALBANY NY DISTRIBUTION CENTER, Arrived at USPS Regional Facility
 2020-03-28 09:54, ISC NEW YORK NY(USPS), Processed Through Facility
 2020-03-28 09:54, Origin Post is Preparing Shipment

Effectively, I simply need to get the package status. I can also be OK with this just pasting the entire text into excel and I can then parse through it with VBA. For some reason, my VBA code is not working to retrieve this through a simple scrape. I am new to HTML so thank you for your help.

 Sub TrackData()

 Dim element As IHTMLElement
 Dim elements As IHTMLElementCollection
 Dim ie As InternetExplorer

 Dim html As HTMLDocument

 Set ie = New InternetExplorer

 ie.Visible = True
 ie.navigate "https://t.17track.net/en#nums=LW572098229CN"

 'Wait until IE has loaded the web page

 Do While ie.readyState <> READYSTATE_COMPLETE



 Set html = ie.document

 Set elements = html.getElementsByClassName("tools")

 Dim count As Long
 Dim erow As Long
 count = 0
 For Each element In elements
 If element.Children = "Package Status:" Then
 Sheets("Stage").Cells(1, 1) = "Found It"
 End If
 Next element

 End Sub
How to&Answers:

The value you want to search for is part of an attribute value; meaning, you would need to check either the .OuterHTML or the specific attribute data-clipboard-text value for that string. You don’t need a loop over multiple nodes. You can use the id to target that node specifically. I would also add in the attribute name to the node selection to ensure that attribute is present without needing a hasAttribute check on the node. I would also add a small pause to ensure page has time to dynamically retrieve content and check that the node is actually found.

Option Explicit

Public Sub CheckForPackageStatus()

    Dim ie As New SHDocVw.InternetExplorer

    Set ie = New SHDocVw.InternetExplorer

    With ie

        .Visible = True
        .Navigate2 "https://t.17track.net/en#nums=LW572098229CN"

        Do: DoEvents: Loop While .Busy Or .ReadyState <> READYSTATE_COMPLETE

        Application.Wait Now + TimeSerial(0, 0, 1)

        Dim node As Object

        Set node = .Document.querySelector("#cl-details[data-clipboard-text]")

        If Not node Is Nothing Then
                If InStr(node.getattribute("data-clipboard-text"), "Package status:") > 0 Then
                    ActiveSheet.Cells(1, 1) = "Found it"
                End If
        End If
    End With
End Sub