Capturing Matches in PHP Regular Expressions
Advanced PHP Regular Expressions - Part 2
Foreword: In this part of the series I explain how to capture matches in PHP regular expression operations; the word 'capture' here means holding the substring matched in the subject.
By: Chrysanthus Date Published: 11 Jul 2019
Introduction
Grouping
When you look at the subject, you may be interested in a particular substring of an overall substring to be matched; you have to target that substring in the regex by placing parentheses around the subpattern in the regex. The subpattern within parentheses in the regex is called a group. After the match, the substring of the overall substring is identified. Read and test the following code that illustrates this:
<?php
$subject = "one two three four five";
$regex = '/tw. (thre.) fou./';
preg_match($regex, $subject, $matches);
echo $matches[0], '<br>';
echo $matches[1];
?>
The variable, $matches is an array that receives the overall matched substring and any captured substring of the overall substring.
The overall matched substring is, “two three four”. The substring of the overall substring matched is “three”. This is from the group, (thre.) . The first element in the $matches array is the overall matched substring. The next element in the array is the substring of the group, matched. The output for the code is:
two three four
three
It is possible to have more than one group in the overall pattern. In this case, the first element has the overall matched substring, the next element has the first matched substring group in the overall pattern, the element after has the second matched substring group in the overall pattern, the element following has the third matched substring group in the overall pattern, and so on. Read and try the following code that illustrates this:
<?php
$subject = "The numbers are: one, two, three, and so on.";
$regex = '/(on.), (tw.), (thre.), and/';
preg_match($regex, $subject, $matches);
echo $matches[0], '<br>';
echo $matches[1], '<br>';
echo $matches[2], '<br>';
echo $matches[3], '<br>';
?>
The output is:
one, two, three, and
one
two
three
Alternative Capture within a Group
Here, alternative means Or. Consider the USA time, 8:5:13. The month can be written as 8 or 08; the day of the month can be written as 5 or 05; the year can be written as 19 or 2019. There are several ways in which this date can be written because of the different alternatives of each of the figures. A subject for the date may be, "8:05:2019"; another subject may instead be, "08:5:19", same thing but written in a different way. A regex to match the whole date and capture the different possible figures is,
/(\d|\d\d):(\d|\d\d):(\d\d|\d\d\d\d)/
where \d represents a digit, | means Or, and so we would have a statement like,
$regex = /(\d|\d\d):(\d|\d\d):(\d\d|\d\d\d\d)/;
For a filled array of the match() function, the first element will have an overall substring for the whole regex, the second element, will have the match for (\d|\d\d); the third element will have the match for (\d|\d\d) and the fourth element will have the match for (\d\d|\d\d\d\d). Try the following code:
<?php
$subject = '08:5:2019';
$regex = '/(\d|\d\d):(\d|\d\d):(\d\d|\d\d\d\d)/';
preg_match($regex, $subject, $matches);
for($i=0; $i<count($matches); ++$i)
{
echo $matches[$i], '<br>';
}
?>
The output is:
08:5:20
08
5
20
Now, for the year, may be you were expecting 2019, but only the first two digits, 20 have been captured. This is because \d\d was typed before \d\d\d\d in the alternative. If you want 2019 to be returned instead of 20, then type (\d\d\d\d|dd) for the year in the regex.
That is it for this part of the series. We stop here and continue in the next part
Chrys
Related Links
Basics of PHP with Security ConsiderationsWhite Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links