JavaScript
Regular Expressions in JavaScript

Regular Expressions in JavaScript

Regular expressions, also known as regex, are simply ways to describe text patterns.  Regex can be very useful in many situations, for example when you need to look for  errors in a large file or retrieve the browser agent a user is using. They can also be  used for form validation, as with regex you can specify valid patterns for field entries such as email addresses or phone numbers.

Table of contents

Basic

Let’s start off easy. The regex pattern is specified between two slashes. This is a valid regex expression:

/JavaScript/

We can use the JavaScript built-in match() function for this. This function returns the regex match on the result (if there is one) in the form of the substring that matched the starting position of this string and the input string.

let text = 'This is JavaScript'
console.log(text.match(/JavaScript/))

Output

[

  ‘JavaScript’,

  index: 8,

  input: ‘This is JavaScript’,

  groups: undefined

]

let text = 'This is javascript in small letters'
console.log(text.match(/JavaScript/))

Output

null

This logs null because it is case-sensitive by default and therefore is not a match.

If you want it to be case-insensitive, you can specify this using an i after the slash. In this case-insensitive example, the expression will match the previous string:

let text = 'This is javascript in small letters'
console.log(text.match(/JavaScript/i))

Output

[‘javascript’, index: 8,input: ‘This is javascript in small letters’,groups: undefined]

The result is an object, containing the found match and the index it started on, as  well as the input that was looked through.

Specifying multiple options for words

In order to specify a certain range of options, we can use this syntax:

let text = 'David Silver Alpha Go Deep Mind'
console.log(text.match(/JavaScript|David|Silver/i))

Here, the expression matches either javascript, David, or Silver. At this point, we are only matching for the first encounter and then we quit. So this is not going to find two or more matches right now.

Output

[ ‘David’, index: 0,input: ‘David Silver Alpha Go Deep Mind’,groups: undefined]

If we wanted to find all matches, we could specify the global modifier, g. It is very similar to what we did for case-insensitive searches. In this example, we are checking for all matches, and it is case-insensitive.

let text = 'David Silver Alpha Go Deep Mind'
console.log(text.match(/JavaScript|David|Silver/gi))

Output

[ ‘David’, ‘Silver’ ]

Character options

 Say we want to search for a string of only one  character equal to a, b, or c. We would write it like this:

let text = "d";
console.log(text.match(/[abc]/))
console.log(text.match(/[abcd]/))

Output

null

[ ‘d’, index: 0, input: ‘d’, groups: undefined ]

For a range of characters.

let text = "d";
console.log(text.match(/[a-d]/))
let text2 = "famous"
console.log(text2.match(/[a-d]/))
console.log(text2.match(/[a-z]/))

Output

[ ‘d’, index: 0, input: ‘d’, groups: undefined ]

[ ‘a’, index: 1, input: ‘famous’, groups: undefined ]

[ ‘f’, index: 0, input: ‘famous’, groups: undefined ]

let text2 = "famous"
console.log(text2.match(/[a-z]/g))

Output

[ ‘f’, ‘a’, ‘m’, ‘o’, ‘u’, ‘s’ ]

And if we wanted any letter, lowercase or uppercase, we would write this:

let text2 = "I am Famous"
console.log(text2.match(/[a-zA-Z]/g))

Output

[ ‘I’, ‘a’, ‘m’, ‘F’, ‘a’, ‘m’,’o’, ‘u’, ‘s’]

We could actually also use the case-insensitive modifier to achieve the same thing, but this would apply to the regex pattern as a whole, and you might only need it to apply for the specific character:

let text2 = "I am Famous"
console.log(text2.match(/[a-z]/ig))

Output

[‘I’, ‘a’, ‘m’, ‘F’, ‘a’, ‘m’,’o’, ‘u’, ‘s’]

 If we wanted to include  numbers as well, we would write:

let text2 = "I am Famous in area 85"
console.log(text2.match(/[a-z0-9]/ig))

Output

[‘I’, ‘ ‘, ‘a’, ‘m’, ‘ ‘, ‘F’, ‘a’, ‘m’, ‘o’, ‘u’,’s’, ‘ ‘, ‘i’, ‘n’, ‘ ‘,’a’, ‘r’, ‘e’, ‘a’, ‘ ‘,’8’, ‘5’, ‘.’]

Finding match for dot character

let text2 = "I am Famous in area 85."
console.log(text2.match(/\./))

Output

[ ‘.’, index: 22, input: ‘I am Famous in area 85.’, groups: undefined ]

If we escape the d, \d, it matches any digit.

let text2 = "I am Famous in area 85."
console.log(text2.match(/\d/));
console.log(text2.match(/\d/g));

Output

[ ‘8’, index: 20, input: ‘I am Famous in area 85.’, groups: undefined ]

[ ‘8’, ‘5’ ]

We can also escape the s, \s, which matches all whitespace characters

let text = "Is this funny!"
console.log(text.match(/\s/));
console.log(text.match(/\s/g));

Output

[ ‘ ‘, index: 2, input: ‘Is this funny!’, groups: undefined ]

[ ‘ ‘, ‘ ‘ ]

A very useful one is \b, which matches text only when it’s at the beginning of a word. 

So, in the following example, it is not going to match the instances of in in “beginning”:

let text = "In the end or in the beginning!"
console.log(text.match(/\bin/));

Output

[‘in’, index: 14,input: ‘In the end or in the beginning!’,groups: undefined]

Even though you can check for characters being numbers, the match() method belongs to the string object, so you implement it on numeric variables. For example, try the following:

let text = "354";
console.log(text.match(/3/g))
let num = 354;
console.log(num.match(/3/g));

Output

[ ‘3’ ]

TypeError: num.match is not a function

Groups

There are many reasons to group your regex. Whenever you want to match a group of characters, you can surround them with parentheses. Have a look at this example:

let text = "I love dinosaurs.";
console.log(text.match(/(love|hate)\s(dinosaurs|tiger)/));
let text2 = "I hate spiders."
console.log(text2.match(/(love|hate)\s(dinosaurs|tiger)/));
let text3 = "I hate dinosaurs"
console.log(text3.match(/(love|hate)\s(dinosaurs|tiger)/))

Output

[‘love dinosaurs’, ‘love’,’dinosaurs’,index: 2,input: ‘I love dinosaurs.’,groups: undefined]

null

[”hate dinosaurs’,’hate’,’dinosaurs’,index: 2,input: ‘I hate dinosaurs’,groups: undefined]

Groups are very powerful when we know how to repeat them. Let’s see how to do that. Very often, you’ll find yourself in need of repeating a certain regex piece. We have several options for this. For example, if we want to match any four alphanumeric characters in a sequence, we could just write this:

let text = "I love dinosaurs.";
console.log(text.match(/[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/));
console.log(text.match(/[a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9][a-zA-Z0-9]/g));

Output

[ ‘love’, index: 2, input: ‘I love dinosaurs.’, groups: undefined ]

[ ‘love’, ‘dino’, ‘saur’ ]

Following code looks for a ‘g’ character that may or not may be preceded by an ‘n’.

 let text = "You are doing great.";
console.log(text.match(/n?g/gi));

Output

[ ‘ng’, ‘g’ ]

If you want something at least once, but optionally more often, you can use the plus sign: +. Here is an example:

let text = "12312341234123";
console.log(text.match(/(123)+/));

Output

[‘123123’, ‘123’, index: 0,input: ‘12312341234123’,groups: undefined]

There are also situations where you want to have a certain piece of regex match any number of times, which can be indicated with the asterisk: *. It will match with any ‘a’ preceded by 123 any number of times.

let text = "12312341234123";
console.log(text.match(/(123)*a/));
let text2 = "123a";
console.log(text2.match(/(123)*a/));
let text3 = "12312a";
console.log(text3.match(/(123)*a/));
let text4 = "123123a123a";
console.log(text4.match(/(123)*a/));
let text5 = "a12";
console.log(text5.match(/(123)*a/));
let text6 = "bba";
console.log(text6.match(/(123)*a/));

Output

null

[ ‘123a’, ‘123’, index: 0, input: ‘123a’, groups: undefined ]

[ ‘a’, undefined, index: 5, input: ‘12312a’, groups: undefined ]

[ ‘123123a’, ‘123’, index: 0, input: ‘123123a123a’, groups: undefined ]

[ ‘a’, undefined, index: 0, input: ‘a12’, groups: undefined ]

[ ‘a’, undefined, index: 2, input: ‘bba’, groups: undefined ]

{min,max}

let text = "aab123";
console.log(text.match(/(aab){1,2}/));
console.log(text.match(/(aab){1,3}/));

let text1 = "aabaab123";
console.log(text1.match(/(aab){1,2}/));
console.log(text1.match(/(aab){1,3}/));


let text2 = "aabaabaab123";
console.log(text2.match(/(aab){1,2}/));
console.log(text2.match(/(aab){1,3}/));

Output

[ ‘aab’, ‘aab’, index: 0, input: ‘aab123’, groups: undefined ]

[ ‘aab’, ‘aab’, index: 0, input: ‘aab123’, groups: undefined ]

[ ‘aabaab’, ‘aab’, index: 0, input: ‘aabaab123’, groups: undefined ]

[ ‘aabaab’, ‘aab’, index: 0, input: ‘aabaab123’, groups: undefined ]

[ ‘aabaab’, ‘aab’, index: 0, input: ‘aabaabaab123’, groups: undefined ]

[ ‘aabaabaab’, ‘aab’,index: 0,input: ‘aabaabaab123’,groups: undefined]

Naming the group

let text = "This is jazz not blues";
console.log(text.match(/(jazz)/))
console.log(text.match(/(?<music>jazz)/));

Output

[ ‘jazz’, ‘jazz’,index: 8,input: ‘This is jazz not blues’,groups: undefined]

[‘jazz’, ‘jazz’,index: 8,input: ‘This is jazz not blues’,groups: [Object: null prototype] { music: ‘jazz’ }]

Searching and Replacing strings

let text = "search me";
console.log(text.search(/me/i))

Output

7

let text = "This tiger is scary, Replace tiger with lion";
console.log(text.replace("tiger","lion"));
console.log(text.replace(/tiger/,"lion"));
console.log(text.replace(/tiger/g,"lion"));

Output

This lion is scary, Replace tiger with lion

This lion is scary, Replace tiger with lion

This lion is scary, Replace lion with lion

Email Validation

In order to create a regex pattern, we need to be able to describe the pattern with words first. Email addresses consist of five parts, in the form of [name]@[domain].[extension].Here are the five parts explained:

  •  name: One or more alphanumerical characters, underscores, dashes, or dots
  •  @: Literal character
  •  domain: One or more alphanumerical characters, underscores, dashes, or dots
  • .: Literal dot
  • extension: One or more alphanumerical characters, underscores, dashes, or dots

So, let’s do the steps for regex:

  •  [a-zA-Z0-9._-]+
  • @
  • [a-zA-Z0-9._-]+
  • \. (remember, the dot is a special character in regex, so we need to escape it)
  • [a-zA-Z0-9._-]+

Putting it all together:

/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/g

let emailPattern = /([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/g
let email1 = "lisp@hotmail.com";
let email2 = "hotmail.com@lisp";
let email3 = "lisp me@hotmail.com";
let email4 = "lisp_me@hotmail.com";
console.log(email1.match(emailPattern));
console.log(email2.match(emailPattern));
console.log(email3.match(emailPattern));
console.log(email4.match(emailPattern));

Output

[ ‘lisp@hotmail.com’ ]

null

[ ‘me@hotmail.com’ ]

[ ‘lisp_me@hotmail.com’ ]

We can see that this email pattern taks “lisp me@hotmail.com”  as me@hotmail.com. Remedy of this is ‘^’. This makes sures that the string input starts with this group.

let emailPattern2 = /^([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/g
let email1 = "lisp@hotmail.com";
let email2 = "hotmail.com@lisp";
let email3 = "lisp me@hotmail.com";
let email4 = "lisp_me@hotmail.com";
console.log(email1.match(emailPattern2));
console.log(email2.match(emailPattern2));
console.log(email3.match(emailPattern2));
console.log(email4.match(emailPattern2));

Output

[ ‘lisp@hotmail.com’ ]

null

null

[ ‘lisp_me@hotmail.com’ ]

We can reduce the regular expression of e-mail verification as:

let emailPattern2 = /^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$/g
let email1 = "lisp@hotmail.com";
let email2 = "hotmail.com@lisp";
let email3 = "lisp me@hotmail.com";
let email4 = "lisp_me@hotmail.com";
console.log(email1.match(emailPattern2));
console.log(email2.match(emailPattern2));
console.log(email3.match(emailPattern2));
console.log(email4.match(emailPattern2));

Output

[ ‘lisp@hotmail.com’ ]

null

null[ ‘lisp_me@hotmail.com’ ]

\w (same as [^a-zA-Z0-9_]

[\w-]{2,4} : indicates there can be minimum of 2 letters string and maximum of 4 letters string in top level domains