Home » Javascript » Encode html entities in javascript

Encode html entities in javascript

Posted by: admin February 12, 2018 Leave a comment

Questions:

I am working in a CMS which allows users to enter content. The problem is that when they add symbols ® , it may not display well in all browsers. I would like to set up a list of symbols that must be searched for, and then converted to the corresponding html entity. For example

® => ®
& => &
© => ©
™ => ™

After the conversion, it needs to be wrapped in a <sup> tag, resulting in this:

® => <sup>&reg;</sup>

Because a particular font size and padding style is necessary:

sup { font-size: 0.6em; padding-top: 0.2em; }

Would the JavaScript be something like this?

var regs = document.querySelectorAll('®');
  for ( var i = 0, l = imgs.length; i < l; ++i ) {
  var [?] = regs[i];
  var [?] = document.createElement('sup');
  img.parentNode.insertBefore([?]);
  div.appendChild([?]);
}

Where “[?]” means that there is something that I am not sure about.

Additional Details:

  • I would like to do this with pure JavaScript, not something that
    requires a library like jQuery, thanks.
  • Backend is Ruby
  • Using RefineryCMS which is built with Ruby on Rails
Answers:

You can use regex to replace any character in a given unicode range with its html entity equivalent. The code would look something like this:

var encodedStr = rawStr.replace(/[\u00A0-\u9999<>\&]/gim, function(i) {
   return '&#'+i.charCodeAt(0)+';';
});

This code will replace all characters in the given range (unicode 00A0 – 9999, as well as ampersand, greater & less than) with their html entity equivalents, which is simply &#nnn; where nnn is the unicode value we get from charCodeAt.

See it in action here: http://jsfiddle.net/E3EqX/13/ (this example uses jQuery for element selectors used in the example. The base code itself, above, does not use jQuery)

Making these conversions does not solve all the problems — make sure you’re using UTF8 character encoding, make sure your database is storing the strings in UTF8. You still may see instances where the characters do not display correctly, depending on system font configuration and other issues out of your control.

Documentation

Questions:
Answers:

The currently accepted answer has several issues. This post explains them, and offers a more robust solution. The solution suggested in that answer is:

var encodedStr = rawStr.replace(/[\u00A0-\u9999<>\&]/gim, function(i) {
  return '&#' + i.charCodeAt(0) + ';';
});

The i flag is redundant since no Unicode symbol in the range from U+00A0 to U+9999 has an uppercase/lowercase variant that is outside of that same range.

The m flag is redundant because ^ or $ are not used in the regular expression.

Why the range U+00A0 to U+9999? It seems arbitrary.

Anyway, for a solution that correctly encodes all except safe & printable ASCII symbols in the input (including astral symbols!), and implements all named character references (not just those in HTML4), use the he library (disclaimer: This library is mine). From its README:

he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.

Also see this relevant Stack Overflow answer.

Questions:
Answers:

I had the same problem and created 2 functions to create entities and translate them back to normal characters.
The following methods translate any string to HTML entities and back on String prototype

/**
 * Convert a string to HTML entities
 */
String.prototype.toHtmlEntities = function() {
    return this.replace(/./gm, function(s) {
        return "&#" + s.charCodeAt(0) + ";";
    });
};

/**
 * Create string from HTML entities
 */
String.fromHtmlEntities = function(string) {
    return (string+"").replace(/&#\d+;/gm,function(s) {
        return String.fromCharCode(s.match(/\d+/gm)[0]);
    })
};

You can then use it as following:

var str = "Test´†®¥¨©˙∫ø…ˆƒ∆÷∑™ƒ∆æø𣨠ƒ™en tést".toHtmlEntities();
console.log("Entities:", str);
console.log("String:", String.fromHtmlEntities(str));

Output in console:

Entities: Dit is e´†®¥¨©˙∫ø…ˆƒ∆÷∑™ƒ∆æø𣨠ƒ™en t£eést
String: Dit is e´†®¥¨©˙∫ø…ˆƒ∆÷∑™ƒ∆æø𣨠ƒ™en t£eést 

Questions:
Answers:

Without any library, if you do not need to support IE < 9, you could create a html element and set its content with Node.textContent:

var str = "<this is not a tag>";
var p = document.createElement("p");
p.textContent = str;
var converted = p.innerHTML;

Here is an example: https://jsfiddle.net/1erdhehv/

Questions:
Answers:

You can use this.

var escapeChars = {
  '¢' : 'cent',
  '£' : 'pound',
  '¥' : 'yen',
  '€': 'euro',
  '©' :'copy',
  '®' : 'reg',
  '<' : 'lt',
  '>' : 'gt',
  '"' : 'quot',
  '&' : 'amp',
  '\'' : '#39'
};

var regexString = '[';
for(var key in escapeChars) {
  regexString += key;
}
regexString += ']';

var regex = new RegExp( regexString, 'g');

function escapeHTML(str) {
  return str.replace(regex, function(m) {
    return '&' + escapeChars[m] + ';';
  });
};

https://github.com/epeli/underscore.string/blob/master/escapeHTML.js

var htmlEntities = {
    nbsp: ' ',
    cent: '¢',
    pound: '£',
    yen: '¥',
    euro: '€',
    copy: '©',
    reg: '®',
    lt: '<',
    gt: '>',
    quot: '"',
    amp: '&',
    apos: '\''
};

function unescapeHTML(str) {
    return str.replace(/\&([^;]+);/g, function (entity, entityCode) {
        var match;

        if (entityCode in htmlEntities) {
            return htmlEntities[entityCode];
            /*eslint no-cond-assign: 0*/
        } else if (match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
            /*eslint no-cond-assign: 0*/
        } else if (match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
};

Questions:
Answers:

If you’re already using jQuery, try html().

$('<div>').text('<script>alert("gotcha!")</script>').html()
// "&lt;script&gt;alert("gotcha!")&lt;/script&gt;"

An in-memory text node is instantiated, and html() is called on it.

It’s ugly, it wastes a bit of memory, and I have no idea if it’s as thorough as something like the he library but if you’re already using jQuery, maybe this is an option for you.

Taken from blog post Encode HTML entities with jQuery by Felix Geisendörfer.

Questions:
Answers:

Sometimes you just want to encode every character… This function replaces “everything but nothing” in regxp.

function encode(e){return e.replace(/[^]/g,function(e){return"&#"+e.charCodeAt(0)+";"})}
function encode(w) {
  return w.replace(/[^]/g, function(w) {
    return "&#" + w.charCodeAt(0) + ";";
  });
}

test.value=encode(document.body.innerHTML.trim());
<textarea id=test rows=11 cols=55>www.WHAK.com</textarea>

Questions:
Answers:

If you want to avoid encode html entities more than once

function encodeHTML(str){
    return str.replace(/[\u00A0-\u9999<>&](?!#)/gim, function(i) {
      return '&#' + i.charCodeAt(0) + ';';
    });
}

function decodeHTML(str){
    return str.replace(/&#([0-9]{1,3});/gi, function(match, num) {
        return String.fromCharCode(parseInt(num));
    });
}

Example

var text = "<a>Content</a>";

text = encodeHTML(text);
console.log("Encode 1 times: " + text);

// <a>Content</a>

text = encodeHTML(text);
console.log("Encode 2 times: " + text);

// <a>Content</a>

text = decodeHTML(text);
console.log("Decoded: " + text);

// <a>Content</a>

Questions:
Answers:

HTML Special Characters & its ESCAPE CODES

Reserved Characters must be escaped by HTML: We can use a character escape to represent any Unicode character [Ex: & – U+00026] in HTML, XHTML or XML using only ASCII characters. Numeric character references [Ex: ampersand(&) – &] & Named character references [Ex: &amp;] are types of character escape used in markup.


Predefined Entities

    Original Character     XML entity replacement    XML numeric replacement  
                  <                                    &lt;                                           <                    
                  >                                     &gt;                                         >                    
                  "                                     &quot;                                      "                    
                  &                                   &amp;                                       &                    
                   '                                    &apos;                                      '                    

To display HTML Tags as a normal form in web page we use <pre>, <code> tags or we can escape them. Escaping the string by replacing with any occurrence of the "&" character by the string "&amp;" and any occurrences of the ">" character by the string "&gt;". Ex: stackoverflow post

function escapeCharEntities() {
    var map = {
        "&": "&amp;",
        "<": "&lt;",
        ">": "&gt;",
        "\"": "&quot;",
        "'": "&apos;"
    };
    return map;
}

var mapkeys = '', mapvalues = '';
var html = {
    encodeRex : function () {
        return  new RegExp(mapkeys, 'gm');
    }, 
    decodeRex : function () {
        return  new RegExp(mapvalues, 'gm');
    },
    encodeMap : JSON.parse( JSON.stringify( escapeCharEntities () ) ),
    decodeMap : JSON.parse( JSON.stringify( swapJsonKeyValues( escapeCharEntities () ) ) ),
    encode : function ( str ) {
        return str.replace(html.encodeRex(), function(m) { return html.encodeMap[m]; });
    },
    decode : function ( str ) {
        return str.replace(html.decodeRex(), function(m) { return html.decodeMap[m]; });
    }
};

function swapJsonKeyValues ( json ) {
    var count = Object.keys( json ).length;
    var obj = {};
    var keys = '[', val = '(', keysCount = 1;
    for(var key in json) {
        if ( json.hasOwnProperty( key ) ) {
            obj[ json[ key ] ] = key;
            keys += key;
            if( keysCount < count ) {
                val += json[ key ]+'|';
            } else {
                val += json[ key ];
            }
            keysCount++;
        }
    }
    keys += ']';    val  += ')';
    console.log( keys, ' == ', val);
    mapkeys = keys;
    mapvalues = val;
    return obj;
}

console.log('Encode: ', html.encode('<input type="password" name="password" value=""/>') ); 
console.log('Decode: ', html.decode(html.encode('<input type="password" name="password" value=""/>')) );

O/P:
Encode:  &lt;input type=&quot;password&quot; name=&quot;password&quot; value=&quot;&quot;/&gt;
Decode:  <input type="password" name="password" value=""/>

Questions:
Answers:

You can use the charCodeAt() method to check if the specified character has a value higher than 127 and convert it to a numeric character reference using toString(16).