This Java Regex tutorial explains what is a Regular Expression in Java, why we need it, and how to use it with the help of Regular Expression examples:
A regular expression in Java that is abbreviated as “regex” is an expression that is used to define a search pattern for strings.
The search pattern can be a simple character or a substring or it may be a complex string or expression that defines a particular pattern to be searched in the string.
Further, the pattern may have to match one or more times to the string.
=> Visit Here To See The Java Training Series For All.
Table of Contents:
Regular Expression: Why We Need It
A regular expression is mainly used to search for a pattern in a string. Why do we search for a pattern in a string? We might want to find a particular pattern in a string and then manipulate it or edit it.
So in a computer application, we may have a continuous requirement of manipulating various patterns. Hence, we always require regex to facilitate searching for the pattern.
Now given a pattern to search for, how exactly does the regex works?
When we analyze and alter the text using a regex, we say that ‘we have applied regex to the string or text’. What we do is we apply the pattern to the text in a ‘left to right’ direction and the source string is matched with the pattern.
For example, consider a string “ababababab”. Let’s assume that a regex ‘aba’ is defined. So now we have to apply this regex to the string. Applying the regex from left to right, the regex will match the string “aba_aba___”, at two places.
Thus once a source character is used in a match, we cannot reuse it. Thus after finding the first match aba, the third character ‘a’ was not reused.
java.util.regex
Java language does not provide any built-in class for regex. But we can work with regular expressions by importing the “java.util.regex” package.
The package java.util.regex provides one interface and three classes as shown below:
Pattern Class: A pattern class represents the compiled regex. The Pattern class does not have any public constructors but it provides static compile () methods that return Pattern objects and can be used to create a pattern.
Matcher Class: The Matcher class object matches the regex pattern to the string. Like Pattern class, this class also does not provide any public constructors. It provides the matcher () method that returns a Matcher object.
PatternSyntaxException: This class defines an unchecked exception. An object of type PatternSyntaxException returns an unchecked exception indicating a syntax error in regex pattern.
MatchResult Interface: The MatchResult interface determines the regex pattern matching result.
Java Regex Example
Let’s implement a simple example of regex in Java. In the below program we have a simple string as a pattern and then we match it to a string. The output prints the start and end position in the string where the pattern is found.
import java.util.regex.Matcher; import java.util.regex.Pattern; public class Main { public static void main(String args[]) { //define a pattern to be searched Pattern pattern = Pattern.compile("Help."); // Search above pattern in "softwareTestingHelp.com" Matcher m = pattern.matcher("softwareTestingHelp.com"); // print the start and end position of the pattern found while (m.find()) System.out.println("Pattern found from position " + m.start() + " to " + (m.end()-1)); } }
Output:
Pattern found from 15 to 19
Regex Matcher In Java
The matcher class implements the MatchResult interface. Matcher acts as a regex engine and is used to perform the exact matching of a character sequence.
Given below are the common methods of the Matcher class. It has more methods but we have listed only the important methods below.
No | Method | Description |
---|---|---|
1 | boolean matches() | Checks if the regex matches the pattern. |
2 | Pattern pattern() | Returns the pattern that the matcher interprets. |
3 | boolean find() | This method finds the next expression to be matched to the pattern. |
4 | boolean find(int start) | Same as find () but finds the expression to be matched from the given start position. |
5 | String group() | Returns the subsequence matching the pattern. |
6 | String group(String name) | Returns the input subsequence. This is captured in the earlier match operation by capturing the group with the specified name. |
7 | int start() | Gives the starting index of matched subsequence and returns it. |
8 | int end() | Returns end position/index of matched subsequence. |
9 | int groupCount() | Return the total number of matched subsequence. |
10 | String replaceAll(String replacement) | Replace all subsequences of the input sequence that match the pattern by given replacement string. |
11 | String replaceFirst(String replacement) | Replace the first matching subsequence of the input sequence by the specified replacement string. |
12 | String toString() | Return the string representation of the current matcher. |
Regular Expression Implementation Example
Let’s see an example of the usage of some of these methods.
import java.util.regex.Matcher; import java.util.regex.Pattern; public class MatcherDemo { public static void main(String[] args) { String inputString = "She sells sea shells on the sea shore with shells"; //obtain a Pattern object Pattern pattern = Pattern.compile("shells"); // obtain a matcher object System.out.println("input string: " + inputString); Matcher matcher = pattern.matcher(inputString); inputString = matcher.replaceFirst("pearls"); System.out.println("\nreplaceFirst method:" + inputString); //use replaceAll method to replace all occurrences of pattern inputString = matcher.replaceAll("pearls"); System.out.println("\nreplaceAll method:" + inputString); } }
Output:
input string: She sells sea shells on the sea shore with shells
replaceFirst method:She sells sea pearls on the sea shore with shells
replaceAll method:She sells sea pearls on the sea shore with pearls
Regex Pattern Class In Java
Pattern class defines the pattern for the regex engine which can then be used to match with the input string.
The following table shows the methods provided by the Pattern class that is commonly used.
No | Method | Description |
---|---|---|
1 | static Pattern compile(String regex) | Returns compiled representation of the regex. |
2 | static Pattern compile(String regex, int flags) | Compiles given regex using specified flags and returns pattern. |
3 | Matcher matcher(CharSequence input) | Returns a matcher by matching the input sequence with the pattern. |
4 | static boolean matches(String regex, CharSequence input) | Compiles the given regex and matches the pattern with a given input. |
5 | int flags() | Returns flags of the pattern with which the matching is done. |
6 | String[] split(CharSequence input) | The input string is split around matches found by a given pattern. |
7 | String[] split(CharSequence input, int limit) | The input string is split around matches found by a given pattern. |
8 | String pattern() | Returns the regular expression pattern. |
9 | static String quote(String s) | Returns a literal String(pattern) for the given String. |
10 | String toString() | Obtain string representation of the pattern. |
The below example uses some of the above methods of Pattern class.
import java.util.regex.*; public class Main { public static void main(String[] args) { // define a REGEX String String REGEX = "Test"; // string to be searched for given pattern String actualString = "Welcome to SoftwareTestingHelp portal"; // generate a pattern for given regex using compile method Pattern pattern = Pattern.compile(REGEX); // set limit to 2 int limit = 2; // use split method to split the string String[] array = pattern.split(actualString, limit); // print the generated array for (int i = 0; i < array.length; i++) { System.out.println("array[" + i + "]=" + array[i]); } } }
Output:
array[0]=Welcome to Software
array[1]=ingHelp portal
In the above program, we use the compile method to generate a pattern. Then we split the input string about this pattern and read it into an array. Finally, we display the array that was generated as a result of splitting the input string.
Regex String Matches Method
We have seen the String.Contains () method in our string tutorials. This method returns a boolean value true or false depending on if the string contains a specified character in it or not.
Similarly, we have a method “matches ()” to check if the string matches with a regular expression or regex. If the string matches the specified regex then a true value is returned or else false is returned.
The general syntax of the matches () method:
public boolean matches (String regex)
If the regex specified is not valid, then the “PatternSyntaxException” is thrown.
Let’s implement a program to demonstrate the usage of the matches () method.
public class MatchesExample{ public static void main(String args[]){ String str = new String("Java Series Tutorials"); System.out.println("Input String: " + str); //use matches () method to check if particular regex matches to the given input System.out.print("Regex: (.*)Java(.*) matches string? " ); System.out.println(str.matches("(.*)Java(.*)")); System.out.print("Regex: (.*)Series(.*) matches string? " ); System.out.println(str.matches("(.*)Series(.*)")); System.out.print("Regex: (.*)Series(.*) matches string? " ); System.out.println(str.matches("(.*)String(.*)")); System.out.print("Regex: (.*)Tutorials matches string? " ); System.out.println(str.matches("(.*)Tutorials")); } }
Output:
Input String: Java Series Tutorials
Regex: (.*)Java(.*) matches string? true
Regex: (.*)Series(.*) matches string? true
Regex: (.*)Series(.*) matches string? false
Regex: (.*)Tutorials matches string? true
We use lots of special characters and Metacharacters with regular expressions in Java. We also use many character classes for pattern matching. In this section, we will provide the tables containing character classes, Meta characters, and Quantifiers that can be used with regex.
Regex Character Classes
No | Character class | Description |
---|---|---|
1 | [pqr] | p,q or r |
2 | [^pqr] | Negation: Any character other than p,q, or r |
3 | [a-zA-Z] | Range:a through z or A through Z, inclusive |
4 | [a-d[m-p]] | Union:a through d, or m through p: [a-dm-p] |
5 | [a-z&&[def]] | Intersection:d, e, or f |
6 | [a-z&&[^bc]] | Subtraction:a through z, except for b and c: [ad-z] |
7 | [a-z&&[^m-p]] | Subtraction: a through z, and not m through p: [a-lq-z] |
Regex Quantifiers
Quantifiers are used to specify the number of times the character will occur in the regex.
The following table shows the common regex quantifiers used in Java.
No | Regex quantifier | Description |
---|---|---|
1 | x? | x appears once or not at all |
2 | x+ | x appears one or more times |
3 | x* | x occurs zero or more times |
4 | x{n} | x occurs n times |
5 | x{n,} | x occurs n or more times |
6 | x{y,z} | x occurs at least y times but less than z times |
Regex Meta Characters
The Metacharacters in regex work as shorthand codes. These codes include whitespace and non-whitespace character along with other shortcodes.
The following table lists the regex Meta characters.
No | Meta Characters | Description |
---|---|---|
1 | . | Any character (may or may not match terminator) |
2 | \d | Any digits, [0-9] |
3 | \D | Any non-digit, [^0-9] |
4 | \s | Any whitespace character, [\t\n\x0B\f\r] |
5 | \S | Any non-whitespace character, [^\s] |
6 | \w | Any word character, [a-zA-Z_0-9] |
7 | \W | Any non-word character, [^\w] |
8 | \b | A word boundary |
9 | \B | A non-word boundary |
Given below is a Java program that uses the above special characters in the Regex.
import java.util.regex.*; public class RegexExample{ public static void main(String args[]){ // returns true if string exactly matches "Jim" System.out.print("Jim (jim):" + Pattern.matches("Jim", "jim")); // Returns true if the input string is Peter or peter System.out.println("\n[Pp]eter(Peter) :" + Pattern.matches("[Pp]eter", "Peter")); //true if string = abc System.out.println("\n.*abc.*(pqabcqp) :" + Pattern.matches(".*abc.*", "pqabcqp")); // true if string doesn't start with a digit System.out.println("\n^[^\\d].*(abc123):" + Pattern.matches("^[^\\d].*", "abc123")); // returns true if the string contains exact three letters System.out.println("\n[a-zA-Z][a-zA-Z][a-zA-Z] (aQz):" + Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z]", "aQz")); System.out.println("\n[a-zA-Z][a-zA-Z][a-zA-Z], a10z" + Pattern.matches("[a-zA-Z][a-zA-Z][a-zA-Z], a10z", "a10z")); //input string length = 4 // true if the string contains 0 or more non-digits System.out.println("\n\\D*, abcde:" + Pattern.matches("\\D*", "abcde")); //True // true of line contains only word this ^-start of the line, $ - end of the line System.out.println("\n^This$, This is Java:" + Pattern.matches("^This$", "This is Java")); System.out.println("\n^This$, This:" + Pattern.matches("^This$, This", "This")); System.out.println("\n^This$, Is This Java?:" + Pattern.matches("^This$, Is This Java?", "Is This Java?")); } }
Output:
Jim (jim):false
[Pp]eter(Peter) :true
.*abc.*(pqabcqp) :true
^[^\d].*(abc123):true
[a-zA-Z][a-zA-Z][a-zA-Z] (aQz):true
[a-zA-Z][a-zA-Z][a-zA-Z], a10zfalse
\D*, abcde:true
^This$, This is Java:false
^This$, This:false
^This$, Is This Java?:false
In the above program, we have provided various regexes that are matched with the input string. Readers are advised to read the comments in the program for each regex to better understand the concept.
Regex Logical or (|) Operator
We can use the logical or (| operator) in regex that gives us the choice to select either operand of | operator. We can use this operator in a regex to give a choice of character or string. For example, if we want to match both the words, ‘test’ and ‘Test’, then we will include these words in logical or operator as Test|test.
Let’s see the following example to understand this operator.
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexOR { public static void main(String[] args) { // Regex string to search for patterns Test or test String regex = "(Test|test)"; // Compiles the pattern and obtains the matcher object from input string. Pattern pattern = Pattern.compile(regex); String input = "Software Testing Help"; Matcher matcher = pattern.matcher(input); // print every match while (matcher.find()) { System.out.format("Text \"%s\" found at %d to %d.%n", matcher.group(), matcher.start(), matcher.end()); } //define another input string and obtain the matcher object input = "SoftwaretestingHelp"; matcher = pattern.matcher(input); // Print every match while (matcher.find()) { System.out.format("Text \"%s\" found at %d to %d.%n", matcher.group(), matcher.start(), matcher.end()); } } }
Output:
Text “Test” found at 9 to 13.
Text “test” found at 8 to 12.
In this program, we have provided the regex “(Test|test)”. Then first we give the input string as “Software Testing Help” and match the pattern. We see that the match is found and the position is printed.
Next, we give the input string as “SoftwaretestingHelp”. This time also the match is found. This is because the regex has used or operator and hence the pattern on either side of | operator is matched with the string.
Email Validation Using Regex
We can also validate email id (address) with regex using java.util.regex.Pattern.matches () method. It matches the given email id with the regex and returns true if the email is valid.
The following program demonstrates the validation of email using regex.
public class EmailDemo { static boolean isValidemail(String email) { String regex = "^[\\w-_\\.+]*[\\w-_\\.]\\@([\\w]+\\.)+[\\w]+[\\w]$"; //regex to validate email. return email.matches(regex); //match email id with regex and return the value } public static void main(String[] args) { String email = "ssthva@gmail.com"; System.out.println("The Email ID is: " + email); System.out.println("Email ID valid? " + isValidemail(email)); email = "@sth@gmail.com"; System.out.println("The Email ID is: " + email); System.out.println("Email ID valid? " + isValidemail(email)); } }
Output:
The Email ID is: ssthva@gmail.com
Email ID valid? true
The Email ID is: @sth@gmail.com
Email ID valid? false
As we can see from the above output, the first email id is valid. The second id directly starts with @, and hence regex does not validate it. Hence it is an invalid id.
Frequently Asked Questions
Q #1) What is in a Regular Expression?
Answer: A Regular Expression commonly called regex is a pattern or a sequence of characters (normal or special or Meta characters) that is used to validate an input string.
Q #2) What is the significance of the Matcher class for a regular expression in Java?
Answer: The matcher class (java.util.regex.Matcher) acts as a regex engine. It performs the matching operations by interpreting the Pattern.
Q #3) What is the pattern in Java?
Answer: The package java.util.regex provides a Pattern class that is used to compile a regex into a pattern which is the standard representation for regex. This pattern is then used to validate strings by matching it with the pattern.
Q #4) What is B in a regular expression?
Answer: The B in regex is denoted as \b and is an anchor character that is used to match a position called word boundary. The start of the line is denoted with a caret (^) and the end of the line is denoted by a dollar ($) sign.
Q #5) Is pattern thread-safe Java?
Answer: Yes. Instances of the Pattern class are immutable and safe for use by multiple concurrent threads. But the matcher class instances are not thread-safe.
Conclusion
In this tutorial, we have discussed Regular Expressions in Java. The regular expression that is also known as ‘regex’ is used to validate the input string in Java. Java provides the ‘java.util.regex’ package that provides classes like Pattern, Matcher, etc. that help to define and match the pattern with the input string.
Also read =>> Python Regular Expression Tutorial with examples
We have also seen various special character classes and Metacharacters that we can use in the regex that give shorthand codes for pattern matching. We also explored email validation using regex.
=> Explore The Simple Java Training Series Here.