Home » excel » excel vba – VBA: How to get a hidden href within <td> tag

excel vba – VBA: How to get a hidden href within <td> tag

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’m trying to get the scores from the soccer world cups since 1930, in this link are the match ups listed and by clicking in the team a new window pop up with the summary I’m looking for. My problem is that in the HTML source code I don’t see a <a> tag with the href inside of it as usually is, i’m trying to simulate a click event with the following VBA code:

Link: http://www.flashscore.com/soccer/world/world-cup-1990/results/

Vba code:

Sub Test()
  Dim URL As String
  Dim IE As InternetExplorer
  Dim HTMLdoc As HTMLDocument
  Dim TDelements As IHTMLElementCollection
  Dim TDelement As HTMLTableCell
  Dim i As Integer

URL = "http://www.flashscore.com/soccer/world/world-cup-1990/results/"

Set IE = New InternetExplorer

With IE
    .Navigate URL
    .Visible = True

    While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend

        Set HTMLdoc = .Document
    End With

        Set TDelements = HTMLdoc.getElementsByTagName("td")

        For Each TDelement In TDelements

            If TDelement.Title = "Click for match detail!" Then
                TDelement.Click
            End If
        Next

   IE.Quit
End Sub
How to&Answers:

I’d do this one differently, as what you’re trying to achieve requires a massive overhead and might take you on a wild goose chase of links, actions, loops, and whatnot. Instead of emulating the actions, it’s much better emulating the results of the actions.

Take for example the Germany – Argentina match. When you click on it, it opens this pop-up:

enter image description here

Of particular note here is the URL of the pop-up. Intuitively enough, the URL has some weird combination that might possibly be a unique identifier. The first thing we check is the tag for that row in the table we just clicked.

enter image description here

Check the highlighted part: it resembles the address in our pop-up closely, albeit there are four (4) extra characters at the beginning. Collapsing the row and cross-checking against other rows, we come to the conclusion that the rows use a similar format:

enter image description here

Removing the g_1_ from the id attribute of the tr tag gives us the unique identifier for every match. I’ll be taking the sanitized id from the Italy – England match and I will input it into the generic URL. We’ll do this in another tab as the pop-up’s address bar is locked:

enter image description here

Voila. The page is open in another tab, and from there you can get what you want from the match summary. You can even check the head-to-head by changing #match-summary to #h2h;overall.

enter image description here

Why does this matter?

Because it’s safer. Since you’re using Excel, you have the sheets at your control. The safest way is to record all these id attributes from each row into, say, a column. Then loop over that column, concatenating it properly to produce the target URLs.

Then you loop over the URLs and get what you want from the EXACT pages you’re looking for. There’s no need anymore for the original link/table at this point since almost all the details are in the “summary” pages anyway.

Hope this helps and good luck.

Follow-up Edit

As I feel a bit guilty not posting any code, here’s my attempt at producing the id values for the World Cup section. Note that I took some liberties with the code, so modify it to suit your approach better. I will only be providing the way to get the id values. All else is up to you.

Sub Test()

    Dim URL As String
    Dim IE As New InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim dictObj As Object: Set dictObj = CreateObject("Scripting.Dictionary")
    Dim tRowID As String

    URL = "http://www.flashscore.com/soccer/world/world-cup-1990/results/"

    With IE
        .Navigate URL
        .Visible = True
        Do Until .ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
        Set HTMLdoc = .Document
    End With

    With HTMLdoc
        'Target the set of tables.
        Set tblSet = .getElementById("fs-results")
        'Index starts at 0. World Cup Playoffs is (0), World Cup is (1),
        'Qualifiers is (2), etc... Targeting tbody right away, as there is
        'only one (1) tbody tag per row anyway.
        Set mTbl = tblSet.getElementsByTagName("tbody")(1)
        Set tRows = mTbl.getElementsByTagName("tr")
        With dictObj
            'If if value is not yet in dictionary, store it.
            For Each tRow In tRows
                'Remove the first four (4) characters.
                tRowID = Mid(tRow.getAttribute("id"), 5)
                If Not .Exists(tRowID) Then
                    .Add tRowID, Empty
                End If
            Next tRow
        End With
    End With

    'Print out each of the ID values.
    For Each Key In dictObj
        Debug.Print Key
    Next Key

    Set IE = Nothing

End Sub

Which returns the following id values for the World Cup table (not the Playoffs!):

04LfR6G7
6RNojjg8
nww8zBgL
t0xCZU8R
dKY0Pgu3
jsY4ODf9
K4YcDdti
2qM7lD3j
bXvdGk2k
tdU0F9He
65hR8YfM
lUsW7hAS
jBqaxXO8
YZY3yivF
n1HjSn11
hvInTSoe
rHUdQZPd
xfS51jmd
ncyhHVnq
4b34kglp
O0kz7LJH
lpQE9EuG
UHlQLBvS
fNQAAfQ9
CKpewD92
EmJrU8Wl
KnT4ETW1
nRnnVlHr
b3T12WXk
jwUc3CIq
n9RDMiPL
pAjW714B
GznS8sk5
GWR6BzB3
UXW8NX9F
beX1CGec

Hope this helps you some more. Kindly make sure you read the comments. Good luck!