Home » Php » Why does the function levenshtein in PHP have a 255-character limit?

Why does the function levenshtein in PHP have a 255-character limit?

Posted by: admin July 12, 2020 Leave a comment

Questions:

Does anybody know why the function levenshtein in PHP has a 255-character limit?

How to&Answers:

This is the PHP full implementation for the function. As you can see there are nested loop based on string characters length:

function lev($s,$t) {
  $m = strlen($s);
  $n = strlen($t);

  for($i=0;$i<=$m;$i++) $d[$i][0] = $i;
  for($j=0;$j<=$n;$j++) $d[0][$j] = $j;

  for($i=1;$i<=$m;$i++) {
    for($j=1;$j<=$n;$j++) {
      $c = ($s[$i-1] == $t[$j-1])?0:1;
      $d[$i][$j] = min($d[$i-1][$j]+1,$d[$i][$j-1]+1,$d[$i-1][$j-1]+$c);
    }
  }

  return $d[$m][$n];
}

https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#PHP

the version for PHP named levenshtein() in PHP starting from version 4.0.1 to 255 chars length.

I think the limitation is introduced to maintain performance and duration in a most acceptable range.

If you need string comparison for lengths > 255, you could use the implementation above.

Answer:

PHP’s levenshtein() function can only handle up to 255 characters, which is not realistic for user input (only the first paragraph of this post has 285 characters). If you choose to use a custom function able to handle more than 255 characters, efficiency is an important issue.

I use this function, specific for this case, but much faster:

function ucase_percent ($str) {
    $str2 = strtolower ($str);

    $l = strlen ($str);
    $ucase = 0;

    for ($i = 0; $i < $l; $i++) {
        if ($str{$i} != $str2{$i}) {
            $ucase++;
        }
    }

    return $ucase / $l * 100.0;
}