Backreferences in PHP Regular Expression
Advanced PHP Regular Expressions - Part 6
Foreword: In this part of the series, I explain how a group (subpattern) in a regex can be represented by a figure, ahead in the same regex.
By: Chrysanthus Date Published: 11 Jul 2019
Introduction
Backreference
Normally, when a writer types two consecutive words that are the same, it is a mistake. You may want to identify such a sequence in a subject string. Consider the following subject:
$subject = "He has one one of the books";
Here, the substring “one one” accidentally typed, begins with “one”, then 1 or more character spaces and then “one” again. You may want to identify this substring. The pattern for the first word of interest is, \b\w\w\w\b . The pattern for 1 or more spaces is, \s+ . The pattern for the next word of interest is \b\w\w\w\b. Note that the two words of interest, one of which repeats, have the same pattern (subpattern). If you want to match the substring with the repeated word, you do not have to type the pattern for the word twice. A more mature regex to use is,
/(\b\w\w\w\b)\s+\g{-1}/
In this expression, 1 represents the previous (\b\w\w\w\b) within the regex, making,
/(\b\w\w\w\b)\s+(\b\w\w\w\b)/
equivalent to,
/(\b\w\w\w\b)\s+\g{-1}/
which will match the phrase, “one one”. As indicated above, 1 represents a previous grouping in the regex. Actually the above regex would match any three-letter word that repeats, e.g. “the the”, “him him”, “man man”, etc.
You can use this same scheme to match a two-syllabus word, where the syllabuses are the same. So the following code segment produces a match for “beriberi”:
<?php
$subject = "What does beriberi mean?";
$regex = '/(beri)\g{-1}/';
preg_match($regex, $subject, $matches);
for ($i=0;$i<count($matches);++$i)
{
echo $matches[$i], '<br>';
}
?>
The output is:
beriberi
beri
The first line is for the complete regex. The second one is for the group.
What about the situation where you have more than two previous subpatterns (groups) distributed out in the regex and you want to repeat them in the same regex ahead? This is where you need 1 for the previous pattern on the left in the regex, 2 for the other previous pattern further on the left, 3 for yet another previous pattern much further on the left in the regex, and so on. Consider the following code that produces a match:
$subject = "Listen: A boy and a girl! Which boy and which girl?";
$regex = '/(boy).+(girl).+\g{-2}.+\g{-1}/';
The phrase matched is, “boy and a girl! Which boy and which girl”, where in the regex, (boy) is for ”boy”, (girl) is for “girl”, then 1 is for (girl) and 2 is for (boy).
Read and test the following code that uses the above expressions:
<?php
$subject = "He has one one of the books";
$regex = '/(\b\w\w\w\b)\s+\g{-1}/';
preg_match($regex, $subject, $matches);
for ($i=0;$i<count($matches);++$i)
{
echo $matches[$i], '<br>';
}
echo '<br>';
$subject = "What does beriberi mean?";
$regex = '/(beri)\g{-1}/';
preg_match($regex, $subject, $matches);
for ($i=0;$i<count($matches);++$i)
{
echo $matches[$i], '<br>';
}
echo '<br>';
$subject = "Listen: A boy and a girl! Which boy and which girl?";
$regex = '/(boy).+(girl).+\g{-2}.+\g{-1}/';
preg_match($regex, $subject, $matches);
for ($i=0;$i<count($matches);++$i)
{
echo $matches[$i], '<br>';
}
?>
The output is:
one one
one
beriberi
beri
boy and a girl! Which boy and which girl
boy
girl
That is it for this part of the series. We stop here and continue in the next part.
Chrys
Related Links
Basics of PHP with Security ConsiderationsWhite Space in PHP
PHP Data Types with Security Considerations
PHP Variables with Security Considerations
PHP Operators with Security Considerations
PHP Control Structures with Security Considerations
PHP String with Security Considerations
PHP Arrays with Security Considerations
PHP Functions with Security Considerations
PHP Return Statement
Exception Handling in PHP
Variable Scope in PHP
Constant in PHP
PHP Classes and Objects
Reference in PHP
PHP Regular Expressions with Security Considerations
Date and Time in PHP with Security Considerations
Files and Directories with Security Considerations in PHP
Writing a PHP Command Line Tool
PHP Core Number Basics and Testing
Validating Input in PHP
PHP Eval Function and Security Risks
PHP Multi-Dimensional Array with Security Consideration
Mathematics Functions for Everybody in PHP
PHP Cheat Sheet and Prevention Explained
More Related Links