Home » excel » html – VBA: How to select a specific webpage div based on class

html – VBA: How to select a specific webpage div based on class

Posted by: admin April 23, 2020 Leave a comment

Questions:

I’m looking to select part of a table (basically don’t want the title, would prefer to leave off headers row too) I can’t seem to get it to work.

HTML:

<table id="mainContent" Class="MainContent-Report" cellspacing="0" cellpadding="0"
...
<div class="reports-area">
<table cellspacing="0" cellpadding="0" width="100%">
<tbody>
<tr class="reports-heading">...</tr>
<tr id="R211" class="Reports-LineOne">...</tr>
<tr id="R212" class="Reports-LineTwo">...</tr>

VBA:

this grabs the entire table, including title and column headers, and works fine:

Set ieTable = ieDoc.all.Item("mainContent")

but what I want to do is something like this, excluding the title and column headers:

Set ieTable = ieDoc.getElementsByClassName("Reports-Area")

For what its worth, The number of rows in the report is dynamic.

I get this error:

“Object Variable or With Block not set”

How to&Answers:

There are several methods you can use. I will show 3 for now.


Filter with CSS selectors:

You could start with simply using an attribute = value css selector with starts with operator (^) to target the classes of interest. I can’t see the rest of the html so unsure if this will pick up additional rows.

Dim rows As Object
Set rows =  ieDoc.querySelectorAll("[class^='Reports-Line']")

An example reading your html (Slighly amended for parsing)…. I isolate the rows of interest and combine their outerHTML. I add <table></table> tags either end and use clipboard to paste table to sheet

Requires reference to Microsoft HTML Object Library.

Option Explicit
Public Sub Test()
    Dim html As HTMLDocument, rows As Object, i As Long, clipboard As Object, s As String

    Set html = GetHTMLFileContent("C:\Users\User\Desktop\test.html")
    Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
    Set rows = html.querySelectorAll("[class^='Reports-Line']")

    For i = 0 To rows.Length - 1
        s = s & rows.item(i).outerHTML
    Next

    s = "<table>" & s & "</table>"
    clipboard.SetText s
    clipboard.PutInClipboard
    ActiveSheet.Range("A1").PasteSpecial
End Sub

Public Function GetHTMLFileContent(ByVal filePath As String) As HTMLDocument
    Dim fso As Object, hFile As Object, hString As String, html As HTMLDocument
    Set html = New HTMLDocument
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set hFile = fso.OpenTextFile(filePath)

    Do Until hFile.AtEndOfStream
        hString = hFile.ReadAll()
    Loop

    html.body.innerHTML = hString
    Set GetHTMLFileContent = html
End Function

HTML sample (needs tweaking but sufficient for demo):

<head></head>
 <body>
  <table id="mainContent" class="MainContent-Report" cellspacing="0" cellpadding="0" ... <div class="reports-area"> 
  </table>
  <table cellspacing="0" cellpadding="0" width="100%"> 
   <tbody> 
    <tr class="reports-heading">
     don't want
    </tr> 
    <tr id="R211" class="Reports-LineOne">
     do want 1
    </tr> 
    <tr id="R212" class="Reports-LineTwo">
     do want 2.
    </tr>
   </tbody>
  </table>
 </body>
</html>

Remove offending item with javascript

The following shows using javascript to remove first row (header row) from a table:

Option Explicit

Public Sub AlterTable()
    Dim ie As Object, html As HTMLDocument, clipboard As Object, hTable As HTMLTable

    Set ie = CreateObject("InternetExplorer.Application")
    Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")

    With ie
        .Visible = True
        .Navigate2 "https://www.w3schools.com/html/html_tables.asp"

        While .Busy Or .readyState <> 4: DoEvents: Wend

        With .document
            .parentWindow.execScript "var child = document.querySelector('#customers tr:nth-of-type(1)'); child.parentNode.removeChild(child);"
            'in your case .parentWindow.execScript "var child = document.querySelector('.reports-area table tr:nth-of-type(1)'); child.parentNode.removeChild(child);"
            Set hTable = .querySelector("#customers")  ' in your case use:  .querySelector(".reports-area table")
        End With

        clipboard.SetText hTable.outerHTML
        clipboard.PutInClipboard
        ActiveSheet.Range("A1").PasteSpecial
        .Quit
    End With
End Sub

Remove offending item with HTMLDocument methods:

Same thing but without javascript i.e. using methods of HTMLDocument direct

Option Explicit

Public Sub AlterTable()
    Dim ie As Object, html As HTMLDocument, clipboard As Object, hTable As HTMLTable

    Set ie = CreateObject("InternetExplorer.Application")
    Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")

    With ie
        .Visible = True
        .Navigate2 "https://www.w3schools.com/html/html_tables.asp"

        While .Busy Or .readyState <> 4: DoEvents: Wend

        With .document
            .querySelector("#customers tr:nth-of-type(1)").ParentNode.RemoveChild .querySelector("#customers tr:nth-of-type(1)")  'in your case use  querySelector(".reports-area table tr:nth-of-type(1)").ParentNode.RemoveChild .querySelector(".reports-area table tr:nth-of-type(1)")
            Set hTable = .querySelector("#customers")  ' in your case use:  .querySelector(".reports-area table")
        End With

        clipboard.SetText hTable.outerHTML
        clipboard.PutInClipboard
        ActiveSheet.Range("A1").PasteSpecial
        .Quit
    End With
End Sub