Home » Php » regex – Extra backslash needed in PHP regexp pattern

regex – Extra backslash needed in PHP regexp pattern

Posted by: admin July 12, 2020 Leave a comment

Questions:

When testing an answer for another user’s question I found something I don’t understand. The problem was to replace all literal \t \n \r characters from a string with a single space.

Now, the first pattern I tried was:

/(?:\[trn])+/

which surprisingly didn’t work. I tried the same pattern in Perl and it worked fine. After some trial and error I found that PHP wants 3 or 4 backslashes for that pattern to match, as in:

/(?:\\[trn])+/

or

/(?:\\[trn])+/

these patterns – to my surprise – both work. Why are these extra backslashes necessary?

How to&Answers:

You need 4 backslashes to represent 1 in regex because:

  • 2 backslashes are used for unescaping in a string ("\\\\" -> \\)
  • 1 backslash is used for unescaping in the regex engine (\\ -> \)

From the PHP doc,

escaping any other character will result in the backslash being printed too1

Hence for \\\[,

  • 1 backslash is used for unescaping the \, one stay because \[ is invalid ("\\\[" -> \\[)
  • 1 backslash is used for unescaping in the regex engine (\\[ -> \[)

Yes it works, but not a good practice.

Answer:

Its works in perl because you pass that directly as regex pattern /(?:\\[trn])+/

but in php, you need to pass as string, so need extra escaping for backslash itself.

"/(?:\\[trn])+/"

The regex \ to match a single
backslash would become ‘/\\\\/’ as a
PHP preg string

Answer:

The regular expression is just /(?:\\[trn])+/. But since you need to escape the backslashes in string declarations as well, each backslash must be expressed with \\:

"/(?:\\[trn])+/"
'/(?:\\[trn])+/'

Just three backspaces do also work because PHP doesn’t know the escape sequence \[ and ignores it. So \\ will become \ but \[ will stay \[.

Answer:

Use str_replace!

$code = str_replace(array("\t","\n","\r"),'',$code);

Should do the trick