I’m currrently trying to scrape info from this Reddit Page. My goal is to make excel open all the posts in new tabs and then I want to scrape information from each of those pages, since the starting page doesn’t have as much information.
I’ve been trying for the last few hours to figure this out, but I’m admittedly pretty confused about how to do it, just overall unsure what to do next, so any pointers would be greatly appreciated!
Here is my current code, it works decently enough but as I said, I’m not sure what I should do next to open the links it finds one by one and scrape each page for data.
The links are scraped off that first page and then added to my spreadsheet right now, but if possible I’d like to just skip that step and scrape them all at once.
Sub GetData() Dim objIE As InternetExplorer Dim itemEle As Object Dim upvote As Integer, awards As Integer, animated As Integer Dim postdate As String, upvotepercent As String, oc As String, filetype As String, linkurl As String, myhtmldata As String, visiComments As String, totalComments As String, removedComments As String Dim y As Integer Set objIE = New InternetExplorer objIE.Visible = False objIE.navigate (ActiveCell.Value) Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop y = 1 For Each itemEle In objIE.document.getElementsByClassName("flat-list buttons") visiComments = itemEle.getElementsByTagName("a")(0).innerText linkurl = itemEle.getElementsByTagName("a")(0).href Sheets("Sheet1").Range("A" & y).Value = visiComments Sheets("Sheet1").Range("B" & y).Value = linkurl y = y + 1 Next End Sub
You should be able to gather the urls then visit in a loop and write results from page visited to array, then array to sheet. Add this after your existing line
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
Dim nodeList As Object , i As Long, urls(), results()
Note: You are only potentially gaining on the page loads, as VBA is single threaded. To do that you would need to store a reference to each tab, or open all first, then loop through relevant open windows to do the scrape. My preference would be to keep in same tab to be honest.
Set nodeList = ie.document.querySelectorAll(".comments") Redim urls(0 To nodeList.Length-1) Redim results(1 to nodeList.Length, 1 to 3) 'Store all urls in an array to later loop For i = 0 To nodeList.Length -1 urls(i) = nodeList.item(i).href Next For i = LBound(urls) To UBound(urls) ie.Navigate2 urls(i) While ie.Busy Or ie.Readystate <> 4: DoEvents:Wend 'may need a pause here results(i + 1, 1) = ie.document.querySelector("a.title").innerText 'title results(i + 1, 2) = ie.document.querySelector(".number").innerText 'upvotes results(i + 1, 3) = ie.document.querySelector(".word").NextSibling.nodeValue '% Next ActiveSheet.Cells(1,1).Resize(UBound(results,1) , UBound(results,2)) = results