Home » excel » html – Scraping a specific <p> class from Yahoo Finance (VBA, Excel)

html – Scraping a specific <p> class from Yahoo Finance (VBA, Excel)

Posted by: admin May 14, 2020 Leave a comment

Questions:

I have been trying to extract specific information from a certain nested class

from the code at the following location
https://finance.yahoo.com/quote/ASUR?p=ASUR

The class where in innertext is located is named “D(ib) Va(t)” and as far as I have seen at least this text is unique for the class name. I am using the following code to get the data.

Private Sub CommandButton1_Click()

Dim IE4 As Object
Dim strURL3 As String
Dim divs1 As Object
Dim symbol1 As String
Dim rowd As Integer
Dim divs2 As Object

'turn calculation off

Application.ScreenUpdating = False
Application.DisplayAlerts = False
Application.Calculation = xlCalculationManual

rowd = 1
'Sheets.Add After:=Sheets(ActiveWorkbook.Sheets.Count)
'Sheets(ActiveSheet.Name).Name = "Industry"

'Worksheets("Panel").Activate
'Range("B9").Select
Set IE4 = CreateObject("InternetExplorer.Application") 'Create only one IE instance


'Do Until ActiveCell.Value = "" 'Loop
'symbol1 = ActiveCell.Value

strURL3 = "https://finance.yahoo.com/quote/ASUR?p=ASUR"

IE4.Visible = True 'Flag to remove IE visibility
        VBA.Shell "RunDll32.exe InetCpl.Cpl, ClearMyTracksByProcess 264", vbMinimizedNoFocus
        VBA.Shell "RunDll32.exe InetCpl.Cpl, ClearMyTracksByProcess 258", vbMinimizedNoFocus
        IE4.Navigate strURL3
        While IE4.Busy: DoEvents: Wend 'Break of 4 seconds after loading
        Application.Wait (Now + TimeValue("0:00:04"))


Set divs1 = IE4.Document.getelementsbytagname("div")

  Worksheets("Industry").Activate
  ActiveSheet.Cells("1,2").Select
 For Each div In divs1
          Set divs2 = IE4.Document.getelementsbytagname("p")
           For Each p In divs2

           If p.classname = "D(ib) Va(t)" Then

            Debug.Print p.innertext

            'Cells(rowd, 2) = p.innertext
            'rowd = rowd + 1
            End If
          Next p

Next div

 'Sheets("Panel").Select
' ActiveCell.Offset(1, 0).Select
 'Loop
    'Sheets("Panel").Select
    'Range("B9").Select 'range that selects rows and columns to paste in every company sheet
    'Range(Selection, Selection.End(xlDown)).Select
    'Range(Selection, Selection.End(xlDown)).Select
    'Range(Selection, Selection.End(xlDown)).Select
   ' Selection.Copy
   ' Sheets("Industry").Select
    'Range("A1").Select
    'ActiveSheet.Paste
   ' Application.CutCopyMode = False

 IE4.Quit

'turn calculation on
Application.Calculation = xlCalculationAutomatic
Application.DisplayAlerts = True
Application.ScreenUpdating = True

End Sub

It does not capture the

class, can anyone help me with this?

How to&Answers:

It’s not a good practice to go for selecting compound classes as long as there is a way to avoid that. Check out the following implementation to achieve the same result:

Sub FetchText()
    Const Link$ = "https://finance.yahoo.com/quote/ASUR?p=ASUR"
    Dim Http As New XMLHTTP60, Htmldoc As New HTMLDocument

    With Http
        .Open "GET", Link, False
        .send
        Htmldoc.body.innerHTML = .responseText
    End With
    MsgBox Htmldoc.querySelector("p.businessSummary").PreviousSibling.LastChild.innerText
End Sub

Answer:

The p tag element with that class includes the company sector, industry and employee info. You can use a faster method of xmlhttp to retrieve by avoiding opening a browser. Then use a css selector combination to target the element

Option Explicit
Public Sub GetInfo()
    Const URL As String = "https://finance.yahoo.com/quote/ASUR?p=ASUR"
    Dim html As HTMLDocument
    Set html = New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .Send
        html.body.innerHTML = .responsetext
    End With
    Debug.Print html.querySelector("p.D\(ib\).Va\(t\)").innerText
End Sub

If you want to avoid compound classes you can use the following:

Debug.Print html.querySelector("p + .D\(ib\)").innerText

References (VBE > Tools > References):

  1. Microsoft HTML Object Library