Groups in PHP Regular Expressions
PHP Regular Expressions with Security Considerations - Part 6
Foreword: In this part of the series, I talk about groups in PHP regular expressions.
By: Chrysanthus Date Published: 18 Jan 2019
Introduction
The grouping metacharacters, (), that is, parentheses also serve another completely different purpose: they allow the capture of sub strings in the subject that matched. Well, pattern (regex) is not usually an exact word or an exact phrase. After the matching has occurred with the subject, can you know the exact word or phrase that was matched? Yes, you can know this, and it is thanks to grouping (parentheses).
Consider the following subject string:
"This is one and that is two."
The pattern, /(one).*(two)/) matches the sub string “one and that is two” in the subject. Now, the whole sub string matched, is captured in an array, when you use the string object match() method. The portions of the subject string matched by a group can also be captured in the array. In this case, “one” and “two” that are in the subject string matched can be captured in an array.
The following code illustrates this:
<?php
preg_match("/(one).*(two)/", "This is one and that is two.", $matches);
echo $matches[0], '<br>';
echo $matches[1], '<br>';
echo $matches[2];
?>
For the output, the first echo construct displays:
one and that is two
The second echo construct displays:
one
The third echo construct displays:
two
Let us look at the code first before we look at its output. The first statement is:
$subject = "There is a bookshelf in my shop.";
First of all note that we have used the match() method. There are two groups in the regex. Now, after execution of the matching process, the sub string in the subject that matches the whole group goes into $matches[0]. The sub string in the subject that matches the first group goes into $matches[1]. The sub string in the subject that matches the second group goes into $matches[2]. Note: The array will acquire these sub strings only if there is matching. If there is no matching, the array will not acquire any sub strings from the subject; it will have only one element with the value, null.
After the first statement in the script, the other three echo statements display the three array values, accordingly. From the explanation given above, there can only be three elements in the array. This explains what you have as the output.
So, to capture the whole regex including any of its groups, you need the string object match() method and an array to hold returned sub strings matched. The whole regex gives rise to the value of the first element of the array. The first group in the regex (from left to right) gives rise to the value of the second element of the array. The second group gives rise to the value of the third element in the array. The rest of the groups, if available, fill the array in that order. The length of the array is the number of sub strings captured. This consists of the sub string in the subject that corresponds to the whole regex and the other sub strings corresponding to the groups in the regex. Remember, if there is no matching, there is no array and the return value of the match() method is, null.
You group a part of a regex using parentheses. You can capture (remember) what is in a group as illustrated above.
It is possible to group part of a regex to have the benefit of its entity behavior without capturing (remembering) it. The syntax for this is:
(?:x)
If you do not want the group (one) above to be captured, use:
(?:one)
The following code illustrates this:
<?php
$subject = "This is one and that is two.";
preg_match("/(?:one).*(two)/", $subject, $matches);
echo $matches[0], '<br>';
echo $matches[1], '<br>';
echo $matches[2], '<br>';
?>
The output is:
one and that is two
two
null
As you can see from the output, the group (?:one) is not captured, because of “?:” that precedes “one”.
Sub String followed by another
The official way to write ECMAScript is “ECMAScript”. You have the word “ECMA” followed immediately by “Script”. In PHP, it is possible to match one sub string if it is immediately followed by another. The syntax is:
x(?=y)
Here, ‘x’ and ‘y’ are sub strings. This is to match ‘x’ if it is immediately followed by ‘y’. So, you can match “ECMA” if it is immediately followed by “Script”. Consider the following subject:
$subject = "I like the ECMAScript language."
The regex to match “ECMA” immediately followed by “Script” is
/ECMA(?=Script)/
The following code illustrates this and produces a match.
<?php
$subject = "I like the ECMAScript language.";
if (preg_match("/ECMA(?=Script)/", $subject) === 1)
echo('Matched');
else
echo('Not Matched');
?>
Now, assume that ECMA is a computer language; ECMAScript is also a computer language. You may be looking for ECMA and not ECMAScript. The syntax for this is:
x(?!y)
Here, ‘x’ and ‘y’ are sub strings. This is to match ‘x’ if it is not immediately followed by ‘y’. So, you can match “ECMA” if it is not immediately followed by “Script”. Consider the following subject:
$subject = "I like the ECMA language. It is a Scripting language."
The regex to match “ECMA” not immediately followed by “Script” is
/ECMA(?!Script)/
The following code illustrates this and produces a match:
<?php
$subject = "I like the ECMA language. It is a Scripting language.";
if (preg_match("/ECMA(?!Script)/", $subject) === 1)
echo('Matched');
else
echo('Not Matched');
?>
That is it for this part of the series. We stop here and continue in the next part.
Chrys
Related Links
Basics of PHP with Security ConsiderationsWhite Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links