Home » excel » html – How can I scrape multiple pages/links at once using VBA?

html – How can I scrape multiple pages/links at once using VBA?

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’m currrently trying to scrape info from this Reddit Page. My goal is to make excel open all the posts in new tabs and then I want to scrape information from each of those pages, since the starting page doesn’t have as much information.

I’ve been trying for the last few hours to figure this out, but I’m admittedly pretty confused about how to do it, just overall unsure what to do next, so any pointers would be greatly appreciated!

Here is my current code, it works decently enough but as I said, I’m not sure what I should do next to open the links it finds one by one and scrape each page for data.
The links are scraped off that first page and then added to my spreadsheet right now, but if possible I’d like to just skip that step and scrape them all at once.

Thanks! 🙂

Sub GetData()

Dim objIE As InternetExplorer
Dim itemEle As Object
Dim upvote As Integer, awards As Integer, animated As Integer
Dim postdate As String, upvotepercent As String, oc As String, filetype As String, linkurl As String, myhtmldata As String, visiComments As String, totalComments As String, removedComments As String
Dim y As Integer

Set objIE = New InternetExplorer
objIE.Visible = False

objIE.navigate (ActiveCell.Value)
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

y = 1

For Each itemEle In objIE.document.getElementsByClassName("flat-list buttons")
visiComments = itemEle.getElementsByTagName("a")(0).innerText
linkurl = itemEle.getElementsByTagName("a")(0).href
Sheets("Sheet1").Range("A" & y).Value = visiComments
Sheets("Sheet1").Range("B" & y).Value = linkurl
y = y + 1
Next

End Sub
How to&Answers:

You should be able to gather the urls then visit in a loop and write results from page visited to array, then array to sheet. Add this after your existing line

Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

Add:

Dim nodeList As Object , i As Long, urls(), results()

Note: You are only potentially gaining on the page loads, as VBA is single threaded. To do that you would need to store a reference to each tab, or open all first, then loop through relevant open windows to do the scrape. My preference would be to keep in same tab to be honest.

Set nodeList = ie.document.querySelectorAll(".comments")
Redim urls(0 To nodeList.Length-1)
Redim results(1 to nodeList.Length, 1 to 3)
'Store all urls in an array to later loop
For i = 0 To nodeList.Length -1 
    urls(i) = nodeList.item(i).href
Next

For i = LBound(urls) To UBound(urls)
    ie.Navigate2   urls(i)
    While ie.Busy Or ie.Readystate <> 4: DoEvents:Wend
    'may need a pause here
    results(i + 1, 1) = ie.document.querySelector("a.title").innerText 'title
    results(i + 1, 2) = ie.document.querySelector(".number").innerText 'upvotes
    results(i + 1, 3) = ie.document.querySelector(".word").NextSibling.nodeValue '%
Next
ActiveSheet.Cells(1,1).Resize(UBound(results,1) , UBound(results,2)) = results