Regular Expressions in JavaScript
Regular expressions, also known as regex, are simply ways to describe text patterns. Regex can be very useful in many situations, for example when you need to look for errors in a large file or retrieve the browser agent a user is using. They can also be used for form validation, as with regex you can specify valid patterns for field entries such as email addresses or phone numbers.
Table of contents
- Basic
- Specifying multiple options for words
- Character options
- Groups
- {min,max}
- Naming the group
- Searching and Replacing strings
- Email Validation
Basic
Let’s start off easy. The regex pattern is specified between two slashes. This is a valid regex expression:
/JavaScript/
We can use the JavaScript built-in match() function for this. This function returns the regex match on the result (if there is one) in the form of the substring that matched the starting position of this string and the input string.
let text = 'This is JavaScript'
console.log(text.match(/JavaScript/))
Output
[
‘JavaScript’,
index: 8,
input: ‘This is JavaScript’,
groups: undefined
]
let text = 'This is javascript in small letters'
console.log(text.match(/JavaScript/))
Output
null
This logs null because it is case-sensitive by default and therefore is not a match.
If you want it to be case-insensitive, you can specify this using an i after the slash. In this case-insensitive example, the expression will match the previous string:
let text = 'This is javascript in small letters'
console.log(text.match(/JavaScript/i))
Output
[‘javascript’, index: 8,input: ‘This is javascript in small letters’,groups: undefined]
The result is an object, containing the found match and the index it started on, as well as the input that was looked through.
Specifying multiple options for words
In order to specify a certain range of options, we can use this syntax:
let text = 'David Silver Alpha Go Deep Mind'
console.log(text.match(/JavaScript|David|Silver/i))
Here, the expression matches either javascript, David, or Silver. At this point, we are only matching for the first encounter and then we quit. So this is not going to find two or more matches right now.
Output
[ ‘David’, index: 0,input: ‘David Silver Alpha Go Deep Mind’,groups: undefined]
If we wanted to find all matches, we could specify the global modifier, g. It is very similar to what we did for case-insensitive searches. In this example, we are checking for all matches, and it is case-insensitive.
let text = 'David Silver Alpha Go Deep Mind'
console.log(text.match(/JavaScript|David|Silver/gi))
Output
[ ‘David’, ‘Silver’ ]
Character options
Say we want to search for a string of only one character equal to a, b, or c. We would write it like this:
let text = "d";
console.log(text.match(/[abc]/))
console.log(text.match(/[abcd]/))
Output
null
[ ‘d’, index: 0, input: ‘d’, groups: undefined ]
For a range of characters.
let text = "d";
console.log(text.match(/[a-d]/))
let text2 = "famous"
console.log(text2.match(/[a-d]/))
console.log(text2.match(/[a-z]/))
Output
[ ‘d’, index: 0, input: ‘d’, groups: undefined ]
[ ‘a’, index: 1, input: ‘famous’, groups: undefined ]
[ ‘f’, index: 0, input: ‘famous’, groups: undefined ]
let text2 = "famous"
console.log(text2.match(/[a-z]/g))
Output
[ ‘f’, ‘a’, ‘m’, ‘o’, ‘u’, ‘s’ ]
And if we wanted any letter, lowercase or uppercase, we would write this:
let text2 = "I am Famous"
console.log(text2.match(/[a-zA-Z]/g))
Output
[ ‘I’, ‘a’, ‘m’, ‘F’, ‘a’, ‘m’,’o’, ‘u’, ‘s’]
We could actually also use the case-insensitive modifier to achieve the same thing, but this would apply to the regex pattern as a whole, and you might only need it to apply for the specific character:
let text2 = "I am Famous"
console.log(text2.match(/[a-z]/ig))
Output
[‘I’, ‘a’, ‘m’, ‘F’, ‘a’, ‘m’,’o’, ‘u’, ‘s’]
If we wanted to include numbers as well, we would write:
let text2 = "I am Famous in area 85"
console.log(text2.match(/[a-z0-9]/ig))
Output
[‘I’, ‘ ‘, ‘a’, ‘m’, ‘ ‘, ‘F’, ‘a’, ‘m’, ‘o’, ‘u’,’s’, ‘ ‘, ‘i’, ‘n’, ‘ ‘,’a’, ‘r’, ‘e’, ‘a’, ‘ ‘,’8’, ‘5’, ‘.’]
Finding match for dot character
let text2 = "I am Famous in area 85."
console.log(text2.match(/\./))
Output
[ ‘.’, index: 22, input: ‘I am Famous in area 85.’, groups: undefined ]
If we escape the d, \d, it matches any digit.
let text2 = "I am Famous in area 85."
console.log(text2.match(/\d/));
console.log(text2.match(/\d/g));
Output
[ ‘8’, index: 20, input: ‘I am Famous in area 85.’, groups: undefined ]
[ ‘8’, ‘5’ ]
We can also escape the s, \s, which matches all whitespace characters
let text = "Is this funny!"
console.log(text.match(/\s/));
console.log(text.match(/\s/g));
Output
[ ‘ ‘, index: 2, input: ‘Is this funny!’, groups: undefined ]
[ ‘ ‘, ‘ ‘ ]
A very useful one is \b, which matches text only when it’s at the beginning of a word.
So, in the following example, it is not going to match the instances of in in “beginning”:
let text = "In the end or in the beginning!"
console.log(text.match(/\bin/));
Output
[‘in’, index: 14,input: ‘In the end or in the beginning!’,groups: undefined]
Even though you can check for characters being numbers, the match() method belongs to the string object, so you implement it on numeric variables. For example, try the following:
let text = "354";
console.log(text.match(/3/g))
let num = 354;
console.log(num.match(/3/g));
Output
[ ‘3’ ]
TypeError: num.match is not a function
Groups
There are many reasons to group your regex. Whenever you want to match a group of characters, you can surround them with parentheses. Have a look at this example:
let text = "I love dinosaurs.";
console.log(text.match(/(love|hate)\s(dinosaurs|tiger)/));
let text2 = "I hate spiders."
console.log(text2.match(/(love|hate)\s(dinosaurs|tiger)/));
let text3 = "I hate dinosaurs"
console.log(text3.match(/(love|hate)\s(dinosaurs|tiger)/))
Output
[‘love dinosaurs’, ‘love’,’dinosaurs’,index: 2,input: ‘I love dinosaurs.’,groups: undefined]
null
[”hate dinosaurs’,’hate’,’dinosaurs’,index: 2,input: ‘I hate dinosaurs’,groups: undefined]
Groups are very powerful when we know how to repeat them. Let’s see how to do that. Very often, you’ll find yourself in need of repeating a certain regex piece. We have several options for this. For example, if we want to match any four alphanumeric characters in a sequence, we could just write this:
let text = "I love dinosaurs.";
console.log(text.match(/[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/));
console.log(text.match(/[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/g));
Output
[ ‘love’, index: 2, input: ‘I love dinosaurs.’, groups: undefined ]
[ ‘love’, ‘dino’, ‘saur’ ]
Following code looks for a ‘g’ character that may or not may be preceded by an ‘n’.
let text = "You are doing great.";
console.log(text.match(/n?g/gi));
Output
[ ‘ng’, ‘g’ ]
If you want something at least once, but optionally more often, you can use the plus sign: +. Here is an example:
let text = "12312341234123";
console.log(text.match(/(123)+/));
Output
[‘123123’, ‘123’, index: 0,input: ‘12312341234123’,groups: undefined]
There are also situations where you want to have a certain piece of regex match any number of times, which can be indicated with the asterisk: *. It will match with any ‘a’ preceded by 123 any number of times.
let text = "12312341234123";
console.log(text.match(/(123)*a/));
let text2 = "123a";
console.log(text2.match(/(123)*a/));
let text3 = "12312a";
console.log(text3.match(/(123)*a/));
let text4 = "123123a123a";
console.log(text4.match(/(123)*a/));
let text5 = "a12";
console.log(text5.match(/(123)*a/));
let text6 = "bba";
console.log(text6.match(/(123)*a/));
Output
null
[ ‘123a’, ‘123’, index: 0, input: ‘123a’, groups: undefined ]
[ ‘a’, undefined, index: 5, input: ‘12312a’, groups: undefined ]
[ ‘123123a’, ‘123’, index: 0, input: ‘123123a123a’, groups: undefined ]
[ ‘a’, undefined, index: 0, input: ‘a12’, groups: undefined ]
[ ‘a’, undefined, index: 2, input: ‘bba’, groups: undefined ]
{min,max}
let text = "aab123";
console.log(text.match(/(aab){1,2}/));
console.log(text.match(/(aab){1,3}/));
let text1 = "aabaab123";
console.log(text1.match(/(aab){1,2}/));
console.log(text1.match(/(aab){1,3}/));
let text2 = "aabaabaab123";
console.log(text2.match(/(aab){1,2}/));
console.log(text2.match(/(aab){1,3}/));
Output
[ ‘aab’, ‘aab’, index: 0, input: ‘aab123’, groups: undefined ]
[ ‘aab’, ‘aab’, index: 0, input: ‘aab123’, groups: undefined ]
[ ‘aabaab’, ‘aab’, index: 0, input: ‘aabaab123’, groups: undefined ]
[ ‘aabaab’, ‘aab’, index: 0, input: ‘aabaab123’, groups: undefined ]
[ ‘aabaab’, ‘aab’, index: 0, input: ‘aabaabaab123’, groups: undefined ]
[ ‘aabaabaab’, ‘aab’,index: 0,input: ‘aabaabaab123’,groups: undefined]
Naming the group
let text = "This is jazz not blues";
console.log(text.match(/(jazz)/))
console.log(text.match(/(?<music>jazz)/));
Output
[ ‘jazz’, ‘jazz’,index: 8,input: ‘This is jazz not blues’,groups: undefined]
[‘jazz’, ‘jazz’,index: 8,input: ‘This is jazz not blues’,groups: [Object: null prototype] { music: ‘jazz’ }]
Searching and Replacing strings
let text = "search me";
console.log(text.search(/me/i))
Output
7
let text = "This tiger is scary, Replace tiger with lion";
console.log(text.replace("tiger","lion"));
console.log(text.replace(/tiger/,"lion"));
console.log(text.replace(/tiger/g,"lion"));
Output
This lion is scary, Replace tiger with lion
This lion is scary, Replace tiger with lion
This lion is scary, Replace lion with lion
Email Validation
In order to create a regex pattern, we need to be able to describe the pattern with words first. Email addresses consist of five parts, in the form of [name]@[domain].[extension].Here are the five parts explained:
- name: One or more alphanumerical characters, underscores, dashes, or dots
- @: Literal character
- domain: One or more alphanumerical characters, underscores, dashes, or dots
- .: Literal dot
- extension: One or more alphanumerical characters, underscores, dashes, or dots
So, let’s do the steps for regex:
- [a-zA-Z0-9._-]+
- @
- [a-zA-Z0-9._-]+
- \. (remember, the dot is a special character in regex, so we need to escape it)
- [a-zA-Z0-9._-]+
Putting it all together:
/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/g
let emailPattern = /([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/g
let email1 = "lisp@hotmail.com";
let email2 = "hotmail.com@lisp";
let email3 = "lisp me@hotmail.com";
let email4 = "lisp_me@hotmail.com";
console.log(email1.match(emailPattern));
console.log(email2.match(emailPattern));
console.log(email3.match(emailPattern));
console.log(email4.match(emailPattern));
Output
[ ‘lisp@hotmail.com’ ]
null
[ ‘me@hotmail.com’ ]
[ ‘lisp_me@hotmail.com’ ]
We can see that this email pattern taks “lisp me@hotmail.com” as me@hotmail.com. Remedy of this is ‘^’. This makes sures that the string input starts with this group.
let emailPattern2 = /^([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/g
let email1 = "lisp@hotmail.com";
let email2 = "hotmail.com@lisp";
let email3 = "lisp me@hotmail.com";
let email4 = "lisp_me@hotmail.com";
console.log(email1.match(emailPattern2));
console.log(email2.match(emailPattern2));
console.log(email3.match(emailPattern2));
console.log(email4.match(emailPattern2));
Output
[ ‘lisp@hotmail.com’ ]
null
null
[ ‘lisp_me@hotmail.com’ ]
We can reduce the regular expression of e-mail verification as:
let emailPattern2 = /^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/g
let email1 = "lisp@hotmail.com";
let email2 = "hotmail.com@lisp";
let email3 = "lisp me@hotmail.com";
let email4 = "lisp_me@hotmail.com";
console.log(email1.match(emailPattern2));
console.log(email2.match(emailPattern2));
console.log(email3.match(emailPattern2));
console.log(email4.match(emailPattern2));
Output
[ ‘lisp@hotmail.com’ ]
null
null[ ‘lisp_me@hotmail.com’ ]
\w (same as [^a-zA-Z0-9_]
[\w-]{2,4} : indicates there can be minimum of 2 letters string and maximum of 4 letters string in top level domains