Home » excel » html – Web Scraping by Span tag

html – Web Scraping by Span tag

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’m trying to copy data from below mentioned web-site, all the range of sizes and cost which is mentioned on the web-page I need that all data. I frame below code but I’m able to copy only three element. Can anybody please look into this?

Url- https://www.leetstorage.com/sizes-and-pricing

Sub TagClassName()

Dim ie As New InternetExplorer, ws As Worksheet

Set ws = ThisWorkbook.Worksheets("Unit Data")
With ie
    .Visible = True
    .Navigate2 "https://www.leetstorage.com/sizes-and-pricing"

    While .Busy Or .readyState < 4: DoEvents: Wend

    Dim listings As Object, listing As Object, headers(), results(), r As Long, c As Long, item As Object
    headers = Array("size")
    Set listings = .document.getElementById("site_content").getElementsByTagName("ul")

    ReDim results(1 To listings.Length, 1 To UBound(headers) + 1)
    For Each listing In listings

        r = r + 1
        On Error Resume Next
        results(r, 1) = listing.getElementsByClassName("font-size-NaN m-font-size-NaN")(0).innerText

        On Error GoTo 0
  Next
    Next
    ws.Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
    ws.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
    .Quit

    End With


End Sub
How to&Answers:

You can use the following. You want the child li elements within the parent ul (unordered list) elements with class innerList


Internet Explorer:

Option Explicit
'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub RetrieveInfo()
    Dim IE As InternetExplorer, i As Long, items As Object
    Set IE = New InternetExplorer

    With IE
        .Visible = True
        .Navigate2 "https://www.leetstorage.com/sizes-and-pricing"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set items = .document.querySelectorAll(".innerList li")

        For i = 0 To items.Length - 1
            With ThisWorkbook.Worksheets("Sheet1")
                .Cells(i + 1, 1) = Trim$(items.item(i).innerText)
            End With
        Next
    End With
End Sub

XHR:

You can do this faster with XHR so long as you provide the User-Agent header

Option Explicit
Public Sub GetInfo()
    Dim html As HTMLDocument, items As Object, i As Long '<  VBE > Tools > References > Microsoft HTML Object Library
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.leetstorage.com/sizes-and-pricing", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        html.body.innerHTML = .responseText
    End With
    Set items = html.querySelectorAll(".innerList li")

    For i = 0 To items.Length - 1
        With ThisWorkbook.Worksheets("Sheet1")
            .Cells(i + 1, 1) = Trim$(items.item(i).innerText)
        End With
    Next
End Sub

ul blocks:

If you look at what is returned by just the class name of the ul then you get the 3 blocks on the page that have the lists:

enter image description here


ul with li:

Taking just one of those blocks to exemplify the effect of adding in the child li elements with descendant combinator: