Home » Java » Regex to validate 3 repeating characters

Regex to validate 3 repeating characters

Posted by: admin December 28, 2021 Leave a comment

Questions:

I’m trying to validate password which should not allow 3 repeating characters regardless of their position in the string.

For example :

121121 – Not Accepted, since 1 appears more than 3 times.

121212 – accepted, since 1 and 2 appears only 3 times

I tried this

([0-9]){2,}

But its validating only consecutive repeated digits.

Answers:

I don’t recommend using regular expressions for something like this, as it would be easier to just collect the password into a Map where a count of each character is maintained. Then, you can just check if there exists any character which has a count of more than 3:

password.chars()
        .boxed()
        .collect(Collectors.groupingBy(i -> i, Collectors.counting()))
        .values()
        .stream()
        .anyMatch(i -> i > 3);

This returns true if there exists some character in password that appears more than 3 times, and false otherwise.

###

The regex solution for this is very inefficient. Please consider treating this answer from pure academic interest.

The pattern that fails strings having 4 or more occurrences of the same char is

^(?!.*(.).*.*.*).*

The last .* may be replaced with a more restrictive pattern if you need to precise this pattern.

See the regex demo.

The main part here is the (?!.*(.).*\1.*\1.*\1) negative lookahead. It matches
any 0+ chars (if Pattern.DOTALL is used, any char including newlines), as many as possible, then it matches and captures (with (.)) any char into Group 1, and then matches any 0+ chars followed with the same char 3 times. If the pattern is found (matched), the whole string match fails.

Why is it inefficient? The pattern relies heavily on backtracking. .* grabs all chars to the end of the string, then the engine backtracks, trying to accommodate some text for the subsequent subpatterns. You may see the backtracking steps here. The more .* there is, the more resource-consuming the pattern is.

Why is lazy variant not any better? The ^(?!.*?(.).*?\1.*?\1.*?\1).* looks to be faster with some strings, and it will be faster if the repeating chars appear close to each other and the start of the string. If they are at the end of the string, the efficiency will degrade. So, if the previous regex matches 121212 in 77 steps, the current one will also take the same amount of steps. However, if you test it against 1212124444, you will see that the lazy variant will fail after 139 steps, while the greedy variant will fail after 58 steps. And vice versa, 4444121212 will cause the lazy regex fail quicker, 14 steps vs. 211 steps with the greedy variant.

In Java, you may use it

s.matches("(?!.*(.).*\1.*\1.*\1)")

or

s.matches("(?!.*?(.).*?\1.*?\1.*?\1)")

Use Jacob’s solution in production.

###

Use a regex with a negative look ahead with back reference:

boolean ok = str.matches("((.)(?!(.*\2){3}))+");

See live demo.

In English, this regex says “every character must not appear 3 more times after itself”.

###

Can you use a map instead,

public static void main(String[] args) {
    System.out.println(validate("121121"));
    System.out.println(validate("121212"));
}

static boolean validate(String s)
{
    HashMap<Character, Integer> map = new HashMap<>();
    for (Character c : s.toCharArray())
    {
        if (map.containsKey(c))
        {
            map.put(c, map.get(c) + 1 );
        }
        else
        {
            map.put(c , 1);
        }
    }

    for (Integer count : map.values())
    {
        if (count > 3)
            return false;
    }
    return true;
}