Regex Modifiers in PHP
PHP Regular Expressions – Part V
Forward: It is possible for you to make a case insensitive match. You need what is called a modifier for this. There are a good number of modifiers and each has its own purpose. We shall learn some of them in this part of the series.
By: Chrysanthus Date Published: 11 Aug 2012
Introduction
The i Modifier
By default, matching is case sensitive. To make it case insensitive, you have to use what is called the i modifier.
So if we have the regex,
/send/
and then we also have
$subject = “Click the Send button.”
the following code will not produce a match:
$subject = "Click the Send button.";
<?php
$subject = "Click the Send button.";
if (preg_match("/send/", $subject))
echo "Matched" . "<br />";
else
echo "Not Matched" . "<br />";
?>
The regex did not match the subject string because the regex has “send” where S is in lower case, but the subject string has “Send” where S is in upper case. If you want this matching to be case insensitive, then your regex will have to be
/send/i
Note the i just after the second forward slash. It is the i modifier. The following code will produce a match.
<?php
$subject = "Click the Send button.";
if (preg_match("/send/i", $subject))
echo "Matched" . "<br />";
else
echo "Not Matched" . "<br />";
?>
Matching has occurred because we have made the regex case insensitive, with the i modifier.
It is possible for you to have more than one sub string in the subject string that would match the regex. By default, only the first sub string in the subject is matched. To match all the sub strings in the subject, you have to use the function preg_match_all(). This is the syntax:
int preg_match_all ( string $pattern , string $subject , array &$matches [, int $flags])
The first argument is the regex. The second is the subject. The third is the array, which holds all the matches. It is a two-dimensional array, here (For the preg_match() function it is a one dimensional array). The fourth argument is optional. We shall talk only about one flag for this argument.
Consider the following subject string:
$subject = "A cat is an animal. A rat is an animal. A bat is a creature.";
In the above subject, you have the sub strings: cat, rat and bat. You have cat first, then rat and then bat. Each of these sub strings match the following regex:
/[cbr]at/
This pattern will match only the first sub string, “cat”. If you want “cat” and “rat” and “bat” to be matched, you have to use the preg_match_all() function as the following code illustrates:
<?php
$subject = "A cat is an animal. A rat is an animal. A bat is a creature.";
if (preg_match_all("/[cbr]at/", $subject, $matches, PREG_PATTERN_ORDER))
echo "Matched" . "<br />";
else
echo "Not Matched" . "<br />";
echo "<br />";
echo $matches[0][0] . "<br />";
echo $matches[0][1] . "<br />";
echo $matches[0][2] . "<br />";
?>
The last argument in the preg_match_all() function is a flag. We shall come back to it shortly. The first, second and third elements of the first array of the two-dimensional array are “cat”, “rat” and “bat”. So the output of the above code is:
Matched
cat
rat
bat
Now the two-dimensional array provides two arrays in the code. The first array receives the sub strings matched, in the order in which the sub strings were seen in the subject.
This is global matching.
With this flag, the results are such that, $matches[0] is an array of full pattern matches, $matches[1] is an array of strings matched by the first parenthesized subpattern, and so on.
The s and m modifiers
The s modifier refers to a single line and the m modifier refers to multiple lines in a string. Usually, without these modifiers, we get what we want. Sometimes, however, we want to keep track of n characters. A file in the hard disk might be made up of many lines of text each ending with the n character. By default, the ^ and $ characters anchor at the beginning and at the end of the subject string. We can make them anchor the beginning and end of lines. The s and the m modifiers affect the interpretation of the ^, $ and the dot metahcaracter. Here is the full description of the s and m modifiers
- no modifiers: Here we look at the case where there is no modifier just after the second forward slash. Under this condition '.' matches any character except "\n" . ^ matches only at the start of the string and $ matches only at the subject string end or before n at the end. This is the default behavior of the dot metacharacter.
- s modifier: This makes the subject string behaves like a long line independent of any newline character that may be there. So '.' matches any character, even "\n" . ^ matches only at the start of the string and $ matches only at the end of the subject string or before n.
- m modifier: This makes the subject string behaves like a set of multiple lines. In the subject string, consecutive lines are separated by the n character. So '.' matches any character except "\n". In this way ^ and $ are able to match at the start or end of any line within the subject string. Here, ^ matches at the beginning of the string or just after the n character, while $ matches just before the n character.
We shall use examples to illustrate the above three conditions. We start by looking at the first condition.
Read the first point above again. Consider the following multiline subject string:
$subject = "The first sentence.\n The second sentence.\n The third sentence.\n";
The subject string has three lines. The following expression produces a match.
preg_match("/second/", $subject)
The sub string “second”, in the second line (sentence) is matched. Consider the following pattern:
/^.*$/
This pattern (regex) is expected under normal circumstances, to match the whole string. Let us see if it does so with the above multi-line subject string. Consider the following code:
<?php
$subject = "The first sentence.\n The second sentence.\n The third sentence.\n";
if (preg_match("/^.*$/", $subject))
echo "Matched" . "<br />";
else
echo "Not Matched" . "<br />";
?>
If you run this code, no matching will occur. This is because of the presence of the n character in the subject string. By default the dot class does not match the n character.
I hope you now appreciate what the first point above is talking about.
Read the second point above again. We shall do a similar thing that we did before. Consider the following subject string:
$subject = "The first sentence.\n The second sentence.\n The third sentence.\n";
The subject string has three lines. The following expression produces a match.
preg_match("/second/s", $subject)
Note that the s modifier has been used. The sub string “second”, in the second line (sentence) is matched. Consider the following pattern:
/^.*$/s
This pattern (regex) is supposed to match the whole string. Let us see if it does so with the above multi-line subject string. Consider the following code:
<?php
$subject = "The first sentence.\n The second sentence.\n The third sentence.\n";
if (preg_match("/^.*$/s", $subject))
echo "Matched" . "<br />";
else
echo "Not Matched" . "<br />";
?>
A match is produced. This is because, with the s modifier, the dot (class) matches the newline character.
I hope you now appreciate what the second point above is talking about.
Read the third point above again. Here we look at the effect of the m modifier. Consider the following subject string:
$subject = "The first sentence.\n The second sentence.\n The third sentence.\n";
The subject string has three lines. The following expression produces a match.
preg_match("/second/m", $subject)
Note that the m modifier has been used. The sub string “second”, in the second line is matched. Consider the following pattern:
/(^.*$)/m
With the m modifier, this pattern (regex) should match only one line. Let us see if it does so with the above multi-line subject string. Consider the following code:
<?php
$subject = "The first sentence.\n The second sentence.\n The third sentence.\n";
if (preg_match("/^.*$/m", $subject))
echo "Matched" . "<br />";
else
echo "Not Matched" . "<br />";
?>
Only the first sentence is matched.
So it matched the first line. You can match and capture all the three sentences in the three lines. You put the pattern in parentheses (sub group). You use the PHP regex function, preg_match_all() instead of preg_match(). You also use the flag PREG_PATTERN_ORDER. The array containing the captured sub strings is a two dimensional array. It has two arrays. The following code illustrates this:
<?php
$subject = "The first sentence.\n The second sentence.\n The third sentence.\n";
if (preg_match_all("/(^.*$)/m", $subject, $matches, PREG_PATTERN_ORDER))
echo "Matched" . "<br />";
else
echo "Not Matched" . "<br />";
echo "<br />";
echo $matches[0][0] . "<br />";
echo $matches[0][1] . "<br />";
echo $matches[0][2] . "<br />";
echo "<br />";
echo $matches[1][0] . "<br />";
echo $matches[1][1] . "<br />";
echo $matches[1][2] . "<br />";
?>
The output is:
Matched
The first sentence.
The second sentence.
The third sentence.
The first sentence.
The second sentence.
The third sentence.
In this case the first and second arrays have the same content.
The x Modifier
If you want to include comments in your regex, you can use the x modifier. With the x modifier whitespaces in regex are ignored. We shall see an example later.
Using more than one Modifier
We shall soon take a break. Before we have a break, know that you can have more than one modifier in a regex, like in:
/send/im
Well, it is time. Let us take a break. See you in the next part of the series.
Chrys
Related Links
Major in Website DesignWeb Development Course
HTML Course
CSS Course
ECMAScript Course
PHP Course
NEXT