Home » excel » excel – Pulling specific table cells from Morningstar, then looping to next Morningstar page

excel – Pulling specific table cells from Morningstar, then looping to next Morningstar page

Posted by: admin May 14, 2020 Leave a comment

Questions:

I am currently trying to scrape certain pieces of data from a table on Morningstar, then have it loop to the next ticker and repeat until there are no more tickers.

Currently, it will pull the entire “rank in category” row on the Trailing Total Returns table. I’m simply trying to pull the 3 month, 6 month, YTD, 1 year, 3 year, and 5 year. When it’s done pulling those, it will loop to the next ticker as determined by the “Cells(p, 14)” in the navigate line.

ie. It detects “LINKX” is in cell 1, 14 so it navigates to http://performance.morningstar.com/fund/performance-return.action?t=LINKX&region=usa&culture=en_US and pulls all of the “Rank in Category” lines from “trailing total returns” table. I only want the specified ones put into specified cell locations, then loop to the next ticker.

I’ve browsed through many of these threads, using excel VBA I am trying to pull key specific info from a certain tickers page, then loop to next ticker and repeat.

Declare PtrSafe Function apiShowWindow Lib "user32" Alias "ShowWindow" _
        (ByVal hwnd As LongPtr, ByVal nCmdShow As LongPtr) As LongPtr
    Global Const SW_MAXIMIZE = 3
    Global Const SW_SHOWNORMAL = 1
    Global Const SW_SHOWMINIMIZED = 2

Sub LinkedInWebScrapeScript()

    Dim objIE As InternetExplorer

    Dim html As HTMLDocument

    Set objIE = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
    objIE.Visible = 1
Dim p As Integer
p = 3

    objIE.navigate ("http://performance.morningstar.com/fund/performance-return.action?t=" & Cells(p, 14) & "&region=usa&culture=en_US")
    Application.Wait Now + #12:00:02 AM#

    While objIE.Busy
        DoEvents
    Wend
    apiShowWindow objIE.hwnd, SW_MAXIMIZE

    For i = 1 To 2
        objIE.document.parentWindow.scrollBy 0, 100000 & i
        Application.Wait Now + #12:00:01 AM#
    Next i

Dim TDelements As IHTMLElementCollection
Dim htmldoc As MSHTML.IHTMLDocument 'Document object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleColtd1 As MSHTML.IHTMLElementCollection
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Set htmldoc = objIE.document 'Document webpage
Set eleColtr = htmldoc.getElementsByTagName("tr") 'Find all tr tags
Set TDelements = htmldoc.getElementsByTagName("table")
'This section populates Excel
i = 0 'start with first value in tr collection


Set eleColtd = htmldoc.getElementsByClassName("r_table3 width955px print97")(0).getElementsByClassName("last")(0).getElementsByClassName("row_data divide") 'get all the td elements in that specific tr

    For Each eleCol In eleColtd 'for each element in the td collection
        Sheets("Sheet2").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
        j = j + 1 'move to next element in td collection
    Next eleCol 'rinse and repeat
i = i + 1

p = p + 1

objIE.navigate ("http://performance.morningstar.com/fund/performance-return.action?t=" & Cells(p, 14) & "&region=usa&culture=en_US")

Set eleColtd = htmldoc.getElementsByClassName("r_table3 width955px print97")(0).getElementsByClassName("last")(0).getElementsByClassName("row_data divide") 'get all the td elements in that specific tr

    For Each eleCol In eleColtd 'for each element in the td collection
        Sheets("Sheet2").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
        z = z + 1
        j = j + 1 'move to next element in td collection
    Next eleCol 'rinse and repeat


End Sub

It will pull the entire “rank in category” row on the Trailing Total Returns table. I’m simply trying to pull the 3 month, 6 month, YTD, 1 year, 3 year, and 5 year. When it’s done pulling those, it will loop to the next ticker as determined by the “Cells(p, 14)” in the navigate line.

How to&Answers:

The following shows a loop and how to select the appropriate table, tbody then table cells using css selectors. Tickers are read into an array from column N starting at row 1. It assumes there are not blank cells within the range (though you could add a test to be sure).

There is a loop over the array, which contains each ticker, and the TICKER placeholder in the url is replaced with the current ticker value.

There is a line to click on the monthly display tab.

The appropriate row is identified via

Set rankings = .querySelectorAll("#tab-month-end-content .last td")

#tab-month-end-content is an id selector which gets the right tab, then .last is the class selector for the class name of the last tbody (which is last), then td is used to specify the child td cells within that tbody.


CSS selectors:

Modern browsers are optimized for css. Css selectors are a fast way to match on elements in an html document. Css selectors are applied via querySelector or querySelectorAll methods; in this case, of HTMLDocument (ie.document). querySelector returns a single node: the first match for the css selector; querySelectorAll returns a nodeList of all matched items – you then index into that nodeList to get specific items e.g. the second td cell is at index 1.

Looking at the pattern we specified:

#tab-month-end-content .last td

The first part is an id selector, #, which selects an element by id

#tab-month-end-content

When applied to the page this returns two matches and we want the second

Click on image to enlarge

enter image description here

The next part

.last 

is a class selector, ., for class name last. This selects the tbody tag child element shown in the image above. As only the second id matched element has this child we are now working with the right parent element to go on and select the td type elements using type selector

td

The whitespace,, in between each part described above are known as descendant combinators, and they specify that elements matched by the second selector are selected if they have an ancestor element matching the first selector i.e. that the selector to the left is a parent of the selector matched elements retrieved by the adjacent css selector to the right.

We can see this with the next image:

Click on image to enlarge

enter image description here


VBA:

Option Explicit
Public Sub GetData()
    Dim ie As Object, tickers(), ws As Worksheet, lastRow As Long
    Dim results(), headers(), r As Long, i As Long, url As String

    headers = Array("ticker", "3m", "6m", "ytd", "1y", "3y", "6y")
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    tickers = Application.Transpose(ws.Range("N1:N" & GetLastRow(ws, 14)).Value)
    ReDim results(1 To UBound(tickers), 1 To UBound(headers) + 1)
    Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
    With ie
        .Visible = True
        For i = LBound(tickers) To UBound(tickers)
            r = r + 1
            url = Replace$("http://performance.morningstar.com/fund/performance-return.action?t=TICKER&region=usa&culture=en_US", "TICKER", tickers(i))
            .Navigate2 url

            While .Busy Or .readyState < 4: DoEvents: Wend

            .document.querySelector("[tabname='#tabmonth']").Click

            Dim rankings As Object
            Do
            Loop While .document.querySelectorAll("#tab-month-end-content .last td").Length = 0 'could add timed loop here

            With .document
                Set rankings = .querySelectorAll("#tab-month-end-content .last td")
                On Error Resume Next
                results(r, 1) = tickers(i)
                results(r, 2) = rankings.item(1).innerText
                results(r, 3) = rankings.item(2).innerText
                results(r, 4) = rankings.item(3).innerText
                results(r, 5) = rankings.item(4).innerText
                results(r, 6) = rankings.item(5).innerText
                results(r, 7) = rankings.item(6).innerText
                On Error GoTo 0
            End With
            Set rankings = Nothing
        Next
        ws.Cells(1, 15).Resize(UBound(results, 1), UBound(results, 2)) = results
        .Quit
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.rows.Count, columnNumber).End(xlUp).Row
    End With
End Function

As mentioned by @SIM, you could use xmlhttp and avoid browser though not sure with your security settings whether need to whitelist sites. You will need to explore if the placeholder is valid in the url here: XNAS:TICKER. The XNAS prefix may vary across your tickers, in which case you would need the appropriate string including prefix in column N and then replace the extended placeholder with that e.g. …..=PLACEHOLDER&region…….

Option Explicit
Public Sub GetData()
    Dim tickers(), ws As Worksheet, lastRow As Long
    Dim results(), headers(), r As Long, i As Long, url As String, html As HTMLDocument
    Set html = New HTMLDocument 'vbe > tools > references > Microsoft HTML Object Library

    headers = Array("ticker", "3m", "6m", "ytd", "1y", "3y", "6y")
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    tickers = Application.Transpose(ws.Range("N1:N" & GetLastRow(ws, 14)).Value)
    ReDim results(1 To UBound(tickers), 1 To UBound(headers) + 1)

    With CreateObject("MSXML2.XMLHTTP")

        For i = LBound(tickers) To UBound(tickers)
            r = r + 1
            url = Replace$("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:TICKER&region=usa&culture=en-US&cur=&ops=clear&s=0P0000J533&ndec=2&ep=true&align=m&annlz=true&comparisonRemove=false&loccat=&taxadj=&benchmarkSecId=&benchmarktype=", "TICKER", tickers(i))
           .Open "GET", url, False
           .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
           .setRequestHeader "DNT", "1"
           .send
           html.body.innerHTML = .responseText

            Dim rankings As Object
            With html
                Set rankings = .querySelectorAll(".last td")

                On Error Resume Next
                results(r, 1) = tickers(i)
                results(r, 2) = rankings.item(1).innerText
                results(r, 3) = rankings.item(2).innerText
                results(r, 4) = rankings.item(3).innerText
                results(r, 5) = rankings.item(4).innerText
                results(r, 6) = rankings.item(5).innerText
                results(r, 7) = rankings.item(6).innerText
                On Error GoTo 0
            End With
            Set rankings = Nothing
        Next
        ws.Cells(1, 15).Resize(UBound(results, 1), UBound(results, 2)) = results
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.rows.Count, columnNumber).End(xlUp).Row
    End With
End Function