Home » Php » php – Expected lifespan of ereg, migrating to preg

php – Expected lifespan of ereg, migrating to preg

Posted by: admin July 12, 2020 Leave a comment

Questions:

I work on a large PHP application (>1 million lines, 10 yrs old) which makes extensive use of ereg and ereg_replace – currently 1,768 unique regular expressions in 516 classes.

I’m very aware why ereg is being deprecated but clearly migrating to preg could be highly involved.

Does anyone know how long ereg support is likely to be maintained in PHP, and/or have any advice for migrating to preg on this scale. I suspect automated translation from ereg to preg is impossible/impractical?

How to&Answers:

I’m not sure when ereg will be removed but my bet is as of PHP 6.0.

Regarding your second issue (translating ereg to preg) doesn’t seem something that hard, if your application has > 1 million lines surely you must have the resources to get someone doing this job for a week at most. I would grep all the ereg_ instances in your code and set up some macros in your favorite IDE (simple stuff like adding delimiters, modifiers and so on).

I bet most of the 1768 regexes can be ported using a macro, and the others, well, a good pair of eyes.

Another option might be to write wrappers around the ereg functions if they are not available, implementing the changes as needed:

if (function_exists('ereg') !== true)
{
    function ereg($pattern, $string, &$regs)
    {
        return preg_match('~' . addcslashes($pattern, '~') . '~', $string, $regs);
    }
}

if (function_exists('eregi') !== true)
{
    function eregi($pattern, $string, &$regs)
    {
        return preg_match('~' . addcslashes($pattern, '~') . '~i', $string, $regs);
    }
}

You get the idea. Also, PEAR package PHP Compat might be a viable solution too.


Differences from POSIX regex

As of PHP 5.3.0, the POSIX Regex
extension is deprecated. There are a
number of differences between POSIX
regex and PCRE regex. This page lists
the most notable ones that are
necessary to know when converting to
PCRE.

  1. The PCRE functions require that the pattern is enclosed by delimiters.
  2. Unlike POSIX, the PCRE extension does not have dedicated functions for
    case-insensitive matching. Instead,
    this is supported using the /i pattern
    modifier. Other pattern modifiers are
    also available for changing the
    matching strategy.
  3. The POSIX functions find the longest of the leftmost match, but
    PCRE stops on the first valid match.
    If the string doesn’t match at all it
    makes no difference, but if it matches
    it may have dramatic effects on both
    the resulting match and the matching
    speed. To illustrate this difference,
    consider the following example from
    “Mastering Regular Expressions” by
    Jeffrey Friedl. Using the pattern
    one(self)?(selfsufficient)? on the
    string oneselfsufficient with PCRE
    will result in matching oneself, but
    using POSIX the result will be the
    full string oneselfsufficient. Both
    (sub)strings match the original
    string, but POSIX requires that the
    longest be the result.

Answer:

My intuition says that they are never going to remove ereg on purpose. PHP still supports really old and deprecated stuff like register globals. There’re simply too many outdated apps out there. There’s however a little chance that the extension has to be removed because someone finds a serious vulnerability and there’s just nobody to fix it.

In any case, it’s worth noting that:

  1. You are not forced to upgrade your PHP installation. It’s pretty common to keep outdated servers to run legady apps.

  2. The PHP_Compat PEAR package offers plain PHP version of some native functions. If ereg disappears, it’s possible that it gets added.


BTW… In fact, PHP 6 is dead. They realised that their approach to make PHP fully Unicode compliant was harder than they thought and they are rethinking it all. The conclusion is: you can never make perfect predictions.

Answer:

I had this problem on a much smaller scale – an application more like 10,000 lines. In every case, all I need to do was switch to preg_replace() and put delimiters around the regex pattern.

Anyone should be able to do that – even a non-programmer can be given a list of filenames and line numbers.

Then just run your tests to watch for any failures that can be fixed.

ereg functions will be removed from PHP6, by the way – http://jero.net/articles/php6.

Answer:

All ereg functions will be removed as of PHP 6, I believe.