First Occurrence in Perl Regex Matching
Advanced Perl Regular Expressions – Part 4
Foreword: In the evaluation of the binding operation by Perl, the regex matches the first occurrence of the sub-string in the subject; that is what I talk about in this part of the series.
By: Chrysanthus Date Published: 2 Apr 2016
Introduction
Illustration
Read and try the following script:
use strict;
my $subject = "I am a man. You are a man. He is a man.";
if ($subject =~ /man/)
{
print "Matched";
}
In the subject string, the word, “man” is typed in three places. The regex is /man/. It is the first occurrence of the sub-string, “man” that is matched in the subject. The other words of “man” in the subject, are ignored. If you want all the occurrences of the sub-string in question to be matched, you have to use the global, g modifier. The first occurrence can be called the leftmost occurrence. The following program is an illustration:
use strict;
my $subject = "This is a cat. That is a rat. Here is a bat.";
my @arr = $subject =~ /[brc]at/g;
print $arr[0], "\n";
print $arr[1], "\n";
print $arr[2], "\n";
use strict;
my $subject = "This is a cat. That is a rat. Here is a bat.";
$subject =~ /([brc]at)/;
print $1;
In this code, the regex matches just “cat” which is the first occurrence of the possible matches in the subject. I did not bother to check by code if the other occurrences are matched; they are not matched.
Same thing with all Alternatives
The class e.g. [brc] produces a set of alternatives in the regex. With any form of alternatives in the regex even with the g modifier present, it is the first occurrence in the subject that is matched first. Read and try the following code, which uses the official alternative operator, | in the regex:
use strict;
my $subject = "This is a child. That is a man. Here is a woman.";
my @arr = $subject =~ /(woman|man|child)/;
print @arr;
The output is just, “child”, which is the leftmost or first occurrence sub-string in the subject corresponding to the regex.
With nested groups, it is still the first occurred sub-string in the subject that matches first; it does not matter what nests what, in the regex. Any group in the regex that corresponds to the first occurred sub-string, matches first. Read and try the following code that illustrates this with the g modifier:
use strict;
"keepers, bookkeepers, bookkeeper and book go together." =~ /book(keeper(s|)|)/g;
print $1, "\n";
print $2, "\n";
The output is,
keepers
s
$1 displays “keepers” and $2 displays “s”. The first occurred sub-string in the subject that could match any group in the regex is “keepers”; the second occurred sub-string that could match any group in the regex is “s”. That is how matching progresses: first sub-string, then second sub-string, then third sub-string, and so on.
Note: if you are capturing the matches into an array, what is matched first goes first into the array.
That is it for this part of the series. We stop here and continue in the next part.
Chrys
Related Links
Perl BasicsPerl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course
BACK NEXT