Home » excel » html – Scraping data from website with css selectors excel vba

html – Scraping data from website with css selectors excel vba

Posted by: admin April 23, 2020 Leave a comment

Questions:

I am trying to scrape specific data from website with CSS selectors. I succeeded with the help of QHar but the requirements now have changed. This is my code below:

Code

Public Sub CompanyData2()

Dim html As HTMLDocument, ws As Worksheet, re As Object

Set re = CreateObject("VBScript.RegExp")
re.Pattern = "\s{2,}"
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://www.bizi.si/iskanje?q=", False
    .send
    html.body.innerHTML = .responseText
End With

ws.Range("A4").Value = re.Replace(Join$(Array(html.querySelector("td.item a").innerText), ", "), Chr$(32))
ws.Range("A5").Value = re.Replace(Join$(Array(html.querySelector("td.item + td.item").innerText), ", "), Chr$(32))
ws.Range("B6").Value = re.Replace(Join$(Array(html.querySelector("td.item + td.item + td.item + td.item").innerText), ", "), Chr$(32))

End Sub

The result is as follows:

enter image description here

Website

enter image description here

I want to extract name of company on sheet 1 A3 like that:

enter image description here

Thank you.

How to&Answers:

You need REPROMAT in A1 then after issuing initial query you have to visit the actual company page to get the company name as you show it. If you are using the company url direct then you can skip the first request and use the code from the second request onwards.

Public Sub CompanyData()
    Dim html As HTMLDocument, ws As Worksheet, nodes As Object

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.bizi.si/iskanje?q=" & Application.EncodeURL(ws.Range("A1").Value), False
        .send
        html.body.innerHTML = .responseText

        Set nodes = html.querySelectorAll("td.item")

        With ws
            .Range("A4").Value = nodes.Item(0).FirstChild.innerText
            .Range("A5").Value = nodes.Item(1).innerText
            .Range("A6").Value = "DŠ: " & nodes.Item(3).innerText
        End With

        .Open "GET", html.querySelector("[id$=linkCompany]").href, False
        .send
        html.body.innerHTML = .responseText
        ws.Range("A3") = html.querySelector("#ctl00_ctl00_cphMain_cphMainCol_CompanySPLPreview1_labTitlePRS").innerText
    End With
End Sub