Home » Linux » How to remove ^[, and all of the escape sequences in a file using linux shell scripting

How to remove ^[, and all of the escape sequences in a file using linux shell scripting

Posted by: admin November 30, 2017 Leave a comment

Questions:

We want to remove ^[, and all of the escape sequences.

sed is not working and is giving us this error:

$ sed 's/^[//g' oldfile > newfile; mv newfile oldfile;
sed: -e expression #1, char 7: unterminated `s' command

$ sed -i '' -e 's/^[//g' somefile
sed: -e expression #1, char 7: unterminated `s' command
Answers:

Are you looking for ansifilter?


Two things you can do: enter the literal escape (in bash:)

Using keyboard entry:

sed ‘s/Ctrl-vEsc//g’

alternatively

sed ‘s/Ctrl-vCtrl-[//g’

Or you can use character escapes:

sed 's/\x1b//g'

or for all control characters:

sed 's/[\x01-\x1F\x7F]//g' # NOTE: zaps TAB character too!

Questions:
Answers:

I managed with the following for my purposes, but this doesn’t include all possible ANSI escapes:

sed -r s/\x1b\[[0-9;]*m?//g

This removes m commands, but for all escapes (as commented by @lethalman) use:

sed -r s/\x1b\[[^@-~]*[@-~]//g

Also see “Python regex to match VT100 escape sequences“.

There is also a table of common escape sequences.

Questions:
Answers:

ansi2txt command (part of kbtin package) seems to be doing the job perfectly on Ubuntu.

Questions:
Answers:

commandlinefu gives the correct answer which strips ANSI colours as well as movement commands:

sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"
Questions:
Answers:

I’ve stumbled upon this post when looking for a way to strip extra formatting from man pages. ansifilter did it, but it was far from desired result (for example all previously-bold characters were duplicated, like SSYYNNOOPPSSIISS).

For that task the correct command would be col -bx, for example:

groff -man -Tascii fopen.3 | col -bx > fopen.3.txt

(source)

Questions:
Answers:

Just a note; let’s say you have a file like this (such line endings are generated by git remote reports):

echo -e "remote: * 27625a8 (HEAD, master) 1st git commit\x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: Current branch master is up to date.\x1b[K" > chartest.txt

In binary, this looks like this:

$ cat chartest.txt | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
00000050  65 3a 20 1b 5b 4b 0a 72  65 6d 6f 74 65 3a 20 1b  |e: .[K.remote: .|
00000060  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000070  65 6d 6f 74 65 3a 20 43  75 72 72 65 6e 74 20 62  |emote: Current b|
00000080  72 61 6e 63 68 20 6d 61  73 74 65 72 20 69 73 20  |ranch master is |
00000090  75 70 20 74 6f 20 64 61  74 65 2e 1b 5b 4b 0a     |up to date..[K.|
0000009f

It is visible that git here adds the sequence 0x1b 0x5b 0x4b before the line ending (0x0a).

Note that – while you can match the 0x1b with a literal format \x1b in sed, you CANNOT do the same for 0x5b, which represents the left square bracket [:

$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
sed: -e expression #1, char 13: Invalid regular expression

You might think you can escape the representation with an extra backslash \ – which ends up as \\x5b; but while that “passes” – it doesn’t match anything as intended:

$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
...

So if you want to match this character, apparently you must write it as escaped left square bracket, that is \[ – the rest of the values can than be entered with escaped \x notation:

$ cat chartest.txt | sed 's/\x1b\[\x4b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 0a  | 1st git commit.|
00000030  72 65 6d 6f 74 65 3a 20  0a 72 65 6d 6f 74 65 3a  |remote: .remote:|
00000040  20 0a 72 65 6d 6f 74 65  3a 20 0a 72 65 6d 6f 74  | .remote: .remot|
00000050  65 3a 20 0a 72 65 6d 6f  74 65 3a 20 0a 72 65 6d  |e: .remote: .rem|
00000060  6f 74 65 3a 20 43 75 72  72 65 6e 74 20 62 72 61  |ote: Current bra|
00000070  6e 63 68 20 6d 61 73 74  65 72 20 69 73 20 75 70  |nch master is up|
00000080  20 74 6f 20 64 61 74 65  2e 0a                    | to date..|
0000008a

Questions:
Answers:

I built vtclean for this. It strips escape sequences using these regular expressions in order (explained in regex.txt):

// handles long-form RGB codes
^3](\d+);([^3]+)3\

// excludes non-movement/color codes
^3(\[[^[email protected]\?]+|[\(\)]).

// parses movement and color codes
^3([\[\]]([\d\?]+)?(;[\d\?]+)*)?(.)`)

It additionally does basic line-edit emulation, so backspace and other movement characters (like left arrow key) are parsed.