I want to extract the links of the toys listed in this webpage:
https://cebra.com.ar/category/73/Juego-de-Construccion.html
I have an entire procedure (I don´t copye here because it´s very long and complicated), in which in some part I have the following code that doesn´t work:
Cells(erow, 1) = html.getElementsByTagName("a").href
Any idea to solve this?
Thanks a lot!
getElementsByTagName
returns a collection and indeed you would need to index into it to get a particular element.
However, you don’t want all a
tags. That is inefficient and you would need an additional test to limit to those of interest. You want specifically the links for products so use an attribute = value css selector to get those:
Dim links As Object, i As Long
Set links = html.querySelectorAll("[href^=product]")
For i = 0 to links.Length - 1
ActiveSheet.Cells(erow + i, 1) = links.item(i).href
Next
This:
[href^=product]
looks for href
attributes whose value starts with, ^, product
.
If you look at the page html you can see each of your target links begins with that substring
Answer:
The function getElementsByTagName()
of the object HTMLDocument
returns a list, but you’re trying to access the property .href
of one object as if it was a single object.
You should replace this:
Cells(erow, 1) = html.getElementsByTagName("a").href
with this
Cells(erow, 1) = html.getElementsByTagName("a")[yourIndex].href
… where yourIndex
is a number representing the index of your list (0
, 1
,… n
).
Of course you’ll have to find the correct rule to get the right a
element at the right place, as just getting all the elements of the document with tag a
retrieves 278 elements in your page (including all the page headers, footers and other things I don’t really think you need):
Tags: api, excelexcel