Home » excel » excel – how to get inner text of html under id?

excel – how to get inner text of html under id?

Posted by: admin May 14, 2020 Leave a comment

Questions:

I am trying to pull data pull inner text under id in excel cell.

This is for XML code.

Sub getelementbyid()
Dim XMLpage As New MSXML2.XMLHTTP60
Dim hdoc As New MSHTML.HTMLDocument
Dim HBEs As MSHTML.IHTMLElementCollection
Dim HBE As MSHTML.IHTMLElement
Dim ha As String
XMLpage.Open "GET","https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
XMLpage.send
hdoc.body.innerHTML = XMLpage.responseText
ha = hdoc.getelementbyid("open").innerText
Range("K11").Value = ha
Debug.Print ha
End Sub

I expect output value, but it shows –.

How to&Answers:

Examine the response text. There is a difference in the way the page is rendered in the browser versus what is returned in the ResponseText.

I put the URL into a browser went into dev tools (F12), found the element, and noted the numeric value inside the HTML element.

Then I dumped the response text we’re getting in VBA into a cell and copied the entire cell value into Notepad++. If you do that you’ll see the initial value inside the #open element is indeed “–“.

The real value appears to be getting written into the HTML via JavaScript, which is common practice. There is a JSON object at the top of the page, presumably injected into the document from the back-end of the website upon your request.

So you have to parse the JSON, not the HTML. I’ve provided code doing just that. Now, there may be a better way to do it, I feel this code is kind of “hacky” but it’s getting the job done for your example URL.

Sub getelementbyid()
    Dim XMLpage As New MSXML2.XMLHTTP60
    Dim hdoc As New MSHTML.HTMLDocument

    Dim HBEs As MSHTML.IHTMLElementCollection
    Dim HBE As MSHTML.IHTMLElement
    Dim ha As String
    XMLpage.Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
    XMLpage.send

    '// sample: ,"open":"681.05",
    Dim token As String
    token = """open"":"""

    Dim startPosition As Integer
    startPosition = InStr(1, XMLpage.responseText, token)

    Dim endPosition As Integer
    endPosition = InStr(startPosition, XMLpage.responseText, ",")

    Dim prop As String
    prop = Mid(XMLpage.responseText, startPosition, endPosition - startPosition)
    prop = Replace(prop, """", vbNullString)
    prop = Replace(prop, "open:", vbNullString)

    Dim val As Double
    val = CDbl(prop)
    ha = val

    Range("K11").Value = ha
    Debug.Print ha
End Sub

Answer:

Here are two methods. 1) Using regex on the return text. Usually frowned upon but perfectly serviceable here. 2) Traditional extract json string and use json parser to parse out value.

The data you want is stored in a json string found both on the webpage and the xmlhtttp response, under the same element:

enter image description here

This means you can treat the html as a string and target just the pattern for the open price using regex as shown below, or parse the xmlhttp request into an html parser, grab the required element, extract its innerText and trim off the whitespace, then pass to a json parser to extract the open price.

In both methods you want to avoid being served cached results so the following header is an important addition to attempt to mitigate for this:

.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"

There is no need for addtional cell formatting. Full value comes out for both your tickers.


Regex:

It is present in a json string in the response. You can regex it out easily from return text.


Regex explanation:

enter image description here


VBA:

Option Explicit
Public Sub GetClosePrice()
    Dim ws As Worksheet, re As Object, p As String, r As String

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    p = """open"":""(.*?)"""
    Set re = CreateObject("VBScript.RegExp")

    With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=HAL", False
            .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
            .send
            If .Status = 200 Then
                r = GetValue(re, .responseText, p)
            Else
                r = "Failed connection"
            End If
    End With
    ws.Range("K11").Value = r
End Sub

Public Function GetValue(ByVal re As Object, ByVal inputString As String, ByVal pattern As String) As String
    With re
        .Global = True
        .pattern = pattern
        If .test(inputString) Then  ' returns True if the regex pattern can be matched agaist the provided string
            GetValue = .Execute(inputString)(0).submatches(0)
        Else
            GetValue = "Not found"
        End If
    End With
End Function

HTML and json parser:

This requires installing code for jsonparser from jsonconverter.bas in a standard module called JsonConverter and then going VBE>Tools>References>Add a reference to Microsoft Scripting Runtime and Microsoft HTML Object Library.

VBA:

Option Explicit

Public Sub GetClosePrice()
    Dim ws As Worksheet, re As Object, r As String, json As Object

    Set ws = ThisWorkbook.Worksheets("Sheet1")

    With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", "https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=MRF", False
            .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
            .send
            If .Status = 200 Then
                Dim html As HTMLDocument
                Set html = New HTMLDocument
                html.body.innerHTML = .responseText
                Set json = JsonConverter.ParseJson(Trim$(html.querySelector("#responseDiv").innerText))
                r = json("data")(1)("open")
            Else
                r = "Failed connection"
            End If
    End With
    ws.Range("K11").Value = r
End Sub