Home » excel » html – Extract table from webpage using VBA

html – Extract table from webpage using VBA

Posted by: admin May 14, 2020 Leave a comment


I would like to extract the table from html code into Excel using VBA.

I have tried the following code several times with changing some of the code but keep on getting error.

Sub GrabTable()

    'dimension (set aside memory for) our variables
    Dim objIE As InternetExplorer
    Dim ele As Object
    Dim y As Integer

    'start a new browser instance
    Set objIE = New InternetExplorer
    'make browser visible
    objIE.Visible = False

    'navigate to page with needed data
    objIE.navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5923061"
    'wait for page to load
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

    'we will output data to excel, starting on row 1
    y = 1

    'look at all the 'tr' elements in the 'table' with id 'InputTable2',
    'and evaluate each, one at a time, using 'ele' variable
    For Each ele In objIE.document.getElementByClassName("InputTable2").getElementsByTagName("tr")
        'show the text content of 'td' element being looked at
        Debug.Print ele.textContent
        'each 'tr' (table row) element contains 2 children ('td') elements
        'put text of 1st 'td' in col A
        Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
        'put text of 2nd 'td' in col B
        Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent

        y = y + 1
    'repeat until last ele has been evaluated

End Sub
How to&Answers:

I show you two methods:

  1. Using IE: The data is inside an iframe which needs to be negotiated

  2. Using XMLHTTP request – much faster and without browser opening. It uses the first part of the iframe document URL which is what the iframe is navigating to.

In both cases I access the tables containing the company name and then the disclosure info table. For the disclosure main info table I copy the outerHTML to the clipboard and paste to Excel to avoid looping all the rows and columns. You can simply set loop the tr (table rows) and td (table cells) within instead.


Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer, clipboard As Object
    With IE
        .Visible = True
        .navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5923061"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
        With .document.getElementById("bm_ann_detail_iframe").contentDocument
            ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .getElementsByClassName("company_name")(0).innerText
            clipboard.SetText .getElementsByTagName("table")(1).outerHTML
        End With

        ThisWorkbook.Worksheets("Sheet1").Cells(2, 1).PasteSpecial
    End With
End Sub


You can extract a different URL from the front-end of the iframe URL and use that as shown below.

Here is the section of your original HTML that shows the iframe and the associated new URL info:

enter image description here

Option Explicit
Public Sub GetTable()
    Dim sResponse As String, html As HTMLDocument, clipboard As Object
    Set html = New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "http://disclosure.bursamalaysia.com/FileAccess/viewHtml?e=2891609", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        sResponse = StrConv(.responseBody, vbUnicode)
    End With

    html.body.innerHTML = sResponse

    With html
        ThisWorkbook.Worksheets("Sheet1").Cells(1, 1) = .querySelector(".company_name").innerText
        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
        clipboard.SetText .querySelector(".InputTable2").outerHTML
    End With

    ThisWorkbook.Worksheets("Sheet1").Cells(2, 1).PasteSpecial

End Sub


Try it this way.

Sub Web_Table_Option_Two()
    Dim HTMLDoc As New HTMLDocument
    Dim objTable As Object
    Dim lRow As Long
    Dim lngTable As Long
    Dim lngRow As Long
    Dim lngCol As Long
    Dim ActRw As Long
    Dim objIE As InternetExplorer
    Set objIE = New InternetExplorer
    objIE.Navigate "http://www.bursamalaysia.com/market/listed-companies/company-announcements/5923061"

    Do Until objIE.ReadyState = 4 And Not objIE.Busy
    Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
    HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
    With HTMLDoc.body
        Set objTable = .getElementsByTagName("table")
        For lngTable = 0 To objTable.Length - 1
            For lngRow = 0 To objTable(lngTable).Rows.Length - 1
                For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
                    ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
                Next lngCol
            Next lngRow
            ActRw = ActRw + objTable(lngTable).Rows.Length + 1
        Next lngTable
    End With
End Sub