Home » excel » html – How to fix 'for each' iteration with getELementsbyTagName?

html – How to fix 'for each' iteration with getELementsbyTagName?

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’m using MSXML and WinHTTP within VBA/Excel. I’m attempting to extract ‘innertext’ from all

tag elements within a element.

How can the sub iterate through all

tags within a specific class and populate a worksheet?

Thanks in advance.

I’m trying adapt this strategy [0] to this website [1] [0] https://codingislove.com/parse-html-in-excel-vba/
[1] https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx

Sub tryKeywordsearch()

Dim http As Object, html As New HTMLDocument
Dim paras As Object, titleElem As Object, detailsElem As Object, para As HTMLHtmlElement
Dim i As Integer
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
http.Send
html.body.innerHTML = http.responseText

Set paras = html.getElementsByClassName("article-content")

i = 1

For Each para In paras
    Set para = para.getElementsByTagName("p")(i)
    Sheets(1).Cells(i, 1).Value = para.innerText
    i = i + 1
Next
End sub
How to&Answers:

There is in fact only one element with that class name, article-content, so you are doing an outer loop of one and thus get no further than i = 1. Additionally, during your first loop you are changing the variable you are looping over which will most likely lead to an error.

For Each para In paras
    Set para = para.getElementsByTagName("p")(i)

In the above, para is your loop variable.

Also, the collection returned by para.getElementsByTagName("p") will start at 0.

How your code would work is if you indexed into initial collection returned by getElementsByClassName and then chain on getElementsByTagName, and use that as your collection to For Each over (leaving index starting at 1 as you can then use it to write out to the correct row; you can use your loop variable para to get the current node innerText):

Option Explicit
Public Sub TryKeywordSearch()
    Dim http As Object, html As New HTMLDocument
    Dim paras As Object, para As Object, i As Long

    Set http = CreateObject("MSXML2.XMLHTTP")
    http.Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
    http.send
    html.body.innerHTML = http.responseText
    Set paras = html.getElementsByClassName("article-content")(0).getElementsByTagName("p")
    i = 1
    For Each para In paras
        ThisWorkbook.Worksheets("Sheet1").Cells(i, 1).Value = para.innerText
        i = i + 1
    Next
End Sub

Instead, you can use a faster, and more readable IMO, css selector combination to get all p tags within a parent with class article-content:

Option Explicit

Public Sub GetParagraphs()
    Dim http As Object, html As HTMLDocument, paragraphs As Object, i As Long
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
        .send
        html.body.innerHTML = .responseText
    End With
    Set paragraphs = html.querySelectorAll(".article-content p")
    For i = 0 To paragraphs.Length - 1
        ThisWorkbook.Worksheets("Sheet1").Cells(i + 1, 1) = paragraphs.item(i).innerText
    Next i
End Sub