Home » excel » html – Webscraping in vba – Structure working data & write from left to right cell

html – Webscraping in vba – Structure working data & write from left to right cell

Posted by: admin May 14, 2020 Leave a comment

Questions:

just registered an account here, and yes i’m a real noob – please be nice to me. Now to my challenge: I am building a web scraper in VBA & have found a code I changed a little to my needs. Everything works perfectly and is actually quite smooth. Now I would like my text that loads into my exel document not to be long, but instead to be wide. I suspect it has to do with “.Offset (I, j)”. I’ve played with it a little bit, but I just manage to ruin everything. Here is my code im usin:

Dim IE As InternetExplorer
Dim htmldoc As MSHTML.IHTMLDocument 'Document object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Dim ieURL As String 'URL

'Open InternetExplorer
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
'Navigate to webpage
ieURL = "#"
IE.Navigate ieURL
'Wait
Do While IE.Busy Or IE.ReadyState <> 4
 DoEvents
Loop
Set htmldoc = IE.Document 'Document webpage
Set eleColtr = htmldoc.getElementsByTagName("tr") 'Find all tr tags
'This section populates Excel
I = 0 'start with first value in tr collection
For Each eleRow In eleColtr 'for each element in the tr collection
 Set eleColtd = htmldoc.getElementsByTagName("tr")(I).getElementsByTagName("td") 'get all the td elements in that specific tr
 j = 0 'start with the first value in the td collection
 For Each eleCol In eleColtd 'for each element in the td collection
 Sheets("Sheet1").Range("A1").Offset(I, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
 j = j + 1 'move to next element in td collection
 Next eleCol 'rinse and repeat
 I = I + 1 'move to next element in td collection
Next eleRow 'rinse and repeat

End Sub ```
How to&Answers:

You don’t need a browser. You can use faster xhr. Grab the table and loop the rows then the columns populating a pre-sized array (be sure to remove the rows where headers are. They can be identified as having [colspan='2'] in their first td). Then transpose the array and write to sheet.

Option Explicit

Public Sub TransposeTable()
    Dim xhr As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument, table As MSHTML.htmltable
    'required VBE (Alt+F11) > Tools > References > Microsoft HTML Object Library ;  Microsoft XML, v6 (your version may vary)

    Set xhr = New MSXML2.XMLHTTP60
    Set html = New MSHTML.HTMLDocument
    '  7NXBG2 ;  8QT2E3

    With xhr
        .Open "GET", "https://www.chrono24.com/watch/8QT2E3", False
        .send
        html.body.innerHTML = .responseText
    End With

    Set table = html.querySelector(".specifications table")

    Dim results(), rowCountToExclude As Long

    rowCountToExclude = html.querySelectorAll(".specifications table [colspan='2']").Length
    ReDim results(1 To table.rows.Length - rowCountToExclude, 1 To table.getElementsByTagName("tr")(0).Children(0).getAttribute("colspan"))

    Dim r As Long, c As Long, outputRow As Long, outputColumn As Long, html2 As MSHTML.HTMLDocument

    Set html2 = New MSHTML.HTMLDocument

    For r = 0 To table.getElementsByTagName("tr").Length - 1
        Dim row As Object

        Set row = table.getElementsByTagName("tr")(r)
        html2.body.innerHTML = "<body> <table>" & row.outerHTML & "</table></body> "

        If html2.querySelectorAll("[colspan='2']").Length = 0 Then
            outputRow = outputRow + 1: outputColumn = 1
            For c = 0 To row.getElementsByTagName("td").Length - 1
                results(outputRow, outputColumn) = row.getElementsByTagName("td")(c).innerText
                outputColumn = outputColumn + 1
            Next
        End If
        Set row = Nothing
    Next

    results = Application.Transpose(results)
    ActiveSheet.Cells(1, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub