Home » excel » excel – VBA – Find preceding html tag

excel – VBA – Find preceding html tag

Posted by: admin March 9, 2020 Leave a comment

Questions:

Say I have HTML source that looks like this

<div id="book-info"> 
  <span class="title">Weather</span>
  <span class="title">Title Of Book</span>
  <p><a href="http://test.com?MMC_ID=34343">Buy Now</a></p>
</div>

What I need returned is “Title Of Book”

There are numerous instances of span class=”title” but the one I need immediately precedes the only MMC_ID tag on the page, so I can use MMC_ID as a marker to get close to the span tag I need.

Question: How can I say “Grab the contents of the very first span tag to the left of MMC_ID?

The below code works sometimes, but there is a variable number of span tags on the page so it fails when that deviation occurs.

With CreateObject("msxml2.xmlhttp")
    .Open "GET", ActiveCell.Offset(0, -1).Value, False
    .Send
    htm.body.innerhtml = .ResponseText
End With

ExtractedText = htm.getElementById("book-info").getElementsByTagName("span")(1).innerText
How to&Answers:

This should do it

Text_1 = htm.getElementById("book-info").innerhtml
if instr(1, text_1, "MMC_ID ", vbTextCompare) > 0 then
   numb_spans = htm.getElementById("book-info").getElementsByTagName("span").length
   ExtractedText = htm.getElementById("book-info").getElementsByTagName("span")(-1 + numb_spans).innerText
else
end if

Answer:

You could loop through all the spans and stop when the child of the next sibling of the next sibling is an anchor element and contains the proper text.

Sub test()

    Dim htm As HTMLDocument
    Dim ExtractedText As String
    Dim hSpan As HTMLSpanElement
    Dim hAnchor As HTMLAnchorElement

    Set htm = New HTMLDocument

    With CreateObject("msxml2.xmlhttp")
        .Open "GET", "file://///99991-dc01/99991/dkusleika/My%20Documents/test.html", False
        .Send
        htm.body.innerHTML = .ResponseText
    End With

    For Each hSpan In htm.getElementById("book-info").getElementsByTagName("span")
        On Error Resume Next
            Set hAnchor = hSpan.NextSibling.NextSibling.FirstChild
        On Error GoTo 0

        If Not hAnchor Is Nothing Then
            If InStr(1, hAnchor.href, "MMC_ID", vbTextCompare) > 0 Then
                ExtractedText = hSpan.innerText
                Exit For
            End If
        End If
    Next hSpan

    Debug.Print ExtractedText

End Sub

Answer:

Is it always the last span element? If so, just count how many elements

htm.getElementById("book-info").getElementsByTagName("span")

returns and grab the last one.