Backreferences in Perl Regular Expression
Advanced Perl Regular Expressions – Part 7
Foreword: In this part of the series, I explain how a group in a regex can be represented by a figure, ahead in the same regex.
By: Chrysanthus Date Published: 2 Apr 2016
Introduction
Back Reference
Normally, when a writer types two consecutive words that are the same, it is a mistake. You may want to identify such a sequence in a subject string. Consider the following subject:
my $subject = "He has one one of the books";
Here, the sub-string “one one” accidentally typed, begins with “one”, then 1 or more character spaces and then “one” again. You may want to identify this sub-string. The pattern for the first word of interest is, \b\w\w\w\b . The pattern for 1 or more spaces is, \s+ . The pattern for the next word of interest is \b\w\w\w\b. Note that the two words of interest, one of which repeats, have the same pattern (sub-pattern). If you want to match the sub-string with the repeated word, you do not have to type the pattern for the word twice. A more mature regex to use is,
/(\b\w\w\w\b)\s+\g{-1}/
In this expression, \g{-1} represents the previous, (\b\w\w\w\b) within the regex, making,
/(\b\w\w\w\b)\s+(\b\w\w\w\b)/
equivalent to,
/(\b\w\w\w\b)\s+\g{-1}/
which will match “one one”. As indicated above, g{-1} represents a previous grouping in the regex. Actually the above regex would match any three-letter words that repeat, e.g. “the the”, “him him”, “man man”, etc. However, you can use this same scheme to match a two-syllabus word, where the syllabuses are the same. So the following binding operation will produce a match:
"What does beriberi mean?" =~ /(beri)\g{-1}/
"Listen: A boy and a girl! Which boy and which girl?" =~ /((boy).+(girl).+\g{-2}.+\g{-1})/;
The phrase matched is, “boy and a girl! Which boy and which girl”, where in the regex, (boy) is for ”boy”, (girl) is for “girl”, then \g{-1} is for (girl) and \g{-2} is for (boy).
Read and try the following code that uses the above expressions:
use strict;
my $subject = "He has one one of the books";
$subject =~ /((\b\w\w\w\b)\s+\g{-1})/;
print $1, "\n";
"What does beriberi mean?" =~ /((beri)\g{-1})/;
print $1, "\n";
"Listen: A boy and a girl! Which boy and which girl?" =~ /((boy).+(girl).+\g{-2}.+\g{-1})/;
print $1, "\n";
print $2, ', ', $3, ' ', $4;
The overall pattern in each case has been placed in a lager group for capturing with $1, $2, $3, etc. Remember, after each successful capturing, the variables, $1, $2, $3, etc. are reset. If a matching fails (returns false) or there is no capturing, these variables are not reset.
That is it for this part of the series. We stop here and continue in the next part.
Chrys
Related Links
Perl BasicsPerl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course
BACK NEXT