Home » excel » excel vba – Unable to fetch title from a text file using vba

excel vba – Unable to fetch title from a text file using vba

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’ve written a script in vba to load a text file containing html elements from my desktop and print the title lie within class name question-hyperlink from it. When I execute my script, it throws an error, object variable or with block---. Where I’m going wrong and what I can do to print it accordingly? Thanks in advance.

Content within the text file:

<div class="summary">
        <h3><a href="/questions/50348809/javascript-if-class-x-contains-z-get-link-of-class-y" class="question-hyperlink">javascript if class x contains z get link of class y</a></h3>
        <div class="excerpt">
            i'm no js expert but need to execute some js in my applescript. Don't know if this is possible as the html page contains several instances of this div class.
If nested div class ".product_card__title"...
        </div>          
        <div class="tags t-javascript t-web-scraping t-applescript">
            <a href="/questions/tagged/javascript" class="post-tag" title="show questions tagged 'javascript'" rel="tag">javascript</a> <a href="/questions/tagged/web-scraping" class="post-tag" title="show questions tagged 'web-scraping'" rel="tag">web-scraping</a> <a href="/questions/tagged/applescript" class="post-tag" title="show questions tagged 'applescript'" rel="tag">applescript</a> 
        </div>
        <div class="started fr">
            <div class="user-info ">
    <div class="user-action-time">
        asked <span title="2018-05-15 11:15:30Z" class="relativetime">2 hours ago</span>
    </div>
    <div class="user-gravatar32">
        <a href="/users/6809723/gto"><div class="gravatar-wrapper-32"><img src="https://www.gravatar.com/avatar/5d4e619fab77f9d58ee457a321e48d37?s=32&amp;d=identicon&amp;r=PG" alt="" width="32" height="32"></div></a>
    </div>
    <div class="user-details">
        <a href="/users/6809723/gto">GTO</a>
        <div class="-flair">
            <span class="reputation-score" title="reputation score " dir="ltr">37</span><span title="1 silver badge"><span class="badge2"></span><span class="badgecount">1</span></span><span title="7 bronze badges"><span class="badge3"></span><span class="badgecount">7</span></span>
        </div>
    </div>
</div>
        </div>  
    </div>

What I’ve tried so far with:

Sub GetFileFromText()
    Dim HTML As New HTMLDocument, post As Object, strCont$

    Open "C:\Users\WCS\Desktop\content.txt" For Binary As #1
    strCont = Space$(LOF(1))
    Get #1, , strCont
    Close #1
    HTML.body.innerHTML = strCont

    Set post = HTML.getElementsByClassName("question-hyperlink")(0)
    MsgBox post.innerText
End Sub

Tried like this as well but still the same:

Sub GetFileFromText()
    Dim strContent$, HTML As New HTMLDocument, post As Object

    With CreateObject("ADODB.Stream")
        .Charset = "utf-8"
        .Open
        .LoadFromFile ("C:\Users\WCS\Desktop\content.txt")
        strContent = .ReadText()
        HTML.body.innerHTML = strContent
    End With

    Set post = HTML.getElementsByClassName("question-hyperlink")(0)
    MsgBox post.innerText
End Sub

The output I was expecting:

javascript if class x contains z get link of class y
How to&Answers:

My comment that the file is UTF-8 encoded was wrong. The strange first (in your case 2) characters define the encoding of the file, ÿþ says ‘UTF-16 (little endian)’. These characters are known as ‘BOM’ or Byte Order Marker. A detailed list can be found at https://en.wikipedia.org/wiki/Byte_order_mark

The good news is that the ADODB.Stream understand your BOM. Your command .Charset = "utf-8" just confused it – it tried to read the data as UTF-8 (which of course failes). Just get rid of that line should do the trick.

To avoid runtime errors, you can check if the assignment to your post-variable was successfull – it could for example fail because the file didn’t contain the class at all:

If post Is Nothing Then
    MsgBox "class not found"
Else
    MsgBox post.innerText
End If