Regular expressions are used regularly to identify patterns. In this post we’ll see how to verify that a text string – say, a password – contains certain specific characters.

As a concrete example, we’ll work with the case of password validation. Most passwords are required to satisfy certain conditions before being accepted. These conditions include:

• A certain minimum length.
• Containing at least one special character (numbers, uppercase letters).
• And possibly, not containing some characters like certain symbols.

Let’s see how to verify each one of these conditions using regular expressions.

### Minimum length

Let’s say the minimal length is of 9 characters. At this point, we’ll only check that the password is 9 or more characters long. We’ll not worry about any of the characters being invalid.

We can check this condition with:

`^.{9,}\$`

Let’s review what each part of the regex represents:

• `^` – specifies the start of the string.
• `.` – A dot (`.`) is a special character in regex that matches any single character.
• `{9, }` – matches the pattern that comes left of it 9 or more times.
• `\$` – end of the string.

In other words, with this pattern, we’re asking for any string that contains 9 or more characters from beginning to end.

### Containing special characters (the wrong way)

We’ll demand our password to contain at least one uppercase letter and at least one digit.

Matching uppercase letters or digits is easily done with the following:

```[A-Z]+
[0-9]+```

We create a set of characters using square brackets `[ ... ]`. We match uppercase letters with `[A-Z]` and digits with `[0-9]`. The plus sign `+` is used as a quantifier. It establishes one or more occurrences of the digit that comes before it.

Putting it all together we get:

`^[A-Z]+[0-9]+.{9,}\$`

However, this pattern will match passwords that specifically:

• Start with an uppercase letter: `\$[A-Z]+`.
• Followed by one or more digits: `[0-9]+`.
• And ending with 9 more characters: `.{9,}\$`.

The problem is that we have defined a rigid pattern. Since regular expressions are used mostly for pattern matching, they will scan strings from left to right matching each character or set of characters with the specified pattern, part by part.

While we could try using a combination of “rigid patterns” to cover all of the possible combinations of patterns that could satisfy our password requirements, it would be cumbersome and unclear. Let’s see a better way of doing it.

### Containing special characters (using look-ahead assertions)

For our particular purposes, validating a password, we don’t really care about the pattern that characters follow. It’s not important if the digits come before the uppercase character or if the password starts with a lowercase instead of a digit.

For this special purpose, we’ll make use of a different regex functionality. We’ll use look-ahead assertions. As its name implies, these are expressions used to assert if a condition is satisfied in order to continue with the actual pattern matching.

Look-ahead assertions are expressed as `(?=<pattern>)`. The regex will verify that the specified `<pattern>` is satisfied ahead of the current location before moving on.

Let’s see how would we use it for our password validator:

`^(?=.*[A-Z])(?=.*[0-9]).{9,}\$`

See that we have preceded each of the look-ahead patterns with `.*` as in `.*[A-Z]`. Remember that a `.` is a special symbol to represent any character. `*` is a quantifier that specifies that the pattern left of it takes place zero or more times. With this addition, each of the patterns `.*[A-Z]` and `.*[0-9]` will match any string containing at least one uppercase letter and at least one digit, respectively.

The pattern can be expressed with words as:

`^`: Start at the beginning of the string. From this location, verify that the following conditions are met: 1) `(?=.*[A-Z])` there’s an uppercase letter ahead, and 2) `(?=.*[0-9])` there’s one digit ahead. If those conditions are met, verify the string is at least 9 characters long: `.{9,}`.

### Forbidden characters

We can verify that a character or set of characters isn’t present in the password with a negative look-ahead assertion: `(?!<pattern>)`. It works similar to look-ahead assertions, only this time the matching will fail if the negative look-ahead pattern is satisfied.

For our password, let’s imagine that we don’t want any of the characters # or !. We can group them between brackets inside the negative look-ahead assertion: `(?![#!]`).

Finally, the complete regular expression will be:

`^(?!.*[#!])(?=.*[A-Z])(?=.*[0-9]).{9,}\$`

This will match any password containing at least one uppercase letter and one digit, not containing any of the characters # and ! and being at least 9 characters long.

Published inProgramming
Subscribe
Notify of