Home » excel » Excel macro to convert HTML entities to text

Excel macro to convert HTML entities to text

Posted by: admin April 23, 2020 Leave a comment

Questions:

I have a huge Excel file that contains the result of an online survey. The person who built the survey messed up the formatting in several respects, and the mess-up I need to take care of first is converting HTML entities to regular text.

From what I can see only two HTML entities are used, , and " but the document is over 12,000 rows so I cannot be sure there are no other HTML entities used… and if other HTML entities are used I want them converted to text as well.

I have successfully made a macro to convert the two HTML entities I mentioned into text, but I don’t know how to make the macro execute on the entire file (i.e. I have to hold down on the macro hot key to make it execute… and it is taking forever).

If there was a macro already available to do what I want that would be great because I could also use a modified version of it for my next task of arranging all the columns and rows in the proper order.

UPDATE:

This is the version of my macro that searches for ,. It works, I just have to hold down on the hot key which takes forever. If I could make this run on the entire Excel file that would be great, and then I can just adjust the macro for each HTML entity until I have eliminated them all.

Sub Macro2()
'
' HTML_Converter Macro
'

'
    Cells.Find(What:=",", After:=ActiveCell, LookIn:=xlFormulas, LookAt _
        :=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:= _
        False, SearchFormat:=False).Activate
    ActiveCell.Replace What:=",", Replacement:=",", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    Cells.Find(What:=",", After:=ActiveCell, LookIn:=xlFormulas, LookAt _
        :=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:= _
        False, SearchFormat:=False).Activate
End Sub
How to&Answers:

Create a backup of the workbook.

Open the VBA editor by pressing Alt+F11.

Double-click “This Workbook” in the treeview at left under the workbook that you are working with.

Copy and paste the following:

Sub UnescapeCharacters()

    ' set this to match your case
    sheetname = "Sheet1"

    Dim sheet As Worksheet
    Set sheet = Me.Worksheets(sheetname)

    For Row = 1 To sheet.UsedRange.Rows.Count
        For Column = 1 To sheet.UsedRange.Columns.Count
            Dim cell As Range
            Set cell = sheet.Cells(Row, Column)

            ' define all your replacements here
            ReplaceCharacter cell, """, """" 'quadruple quotes required
            ReplaceCharacter cell, ",", ","
        Next Column
    Next Row

End Sub

Sub ReplaceCharacter(ByRef cell As Range, ByVal find As String, ByVal replacement As String)

    Dim result As String
    cell.Value = replace(cell.Text, find, replacement, 1, -1)

End Sub

This just iterates over every cell in the specified worksheet and replaces everything you define. The provided code replaces the two character codes you mentioned.

You can run it as a macro, or just place the caret in the “UnescapeCharacters” subroutine and hit F5.

Answer:

I’ve made a Excel addin that has this feature:
https://www.youtube.com/watch?v=DGEc6tLaKvA