Embedded comments and Modifiers in Perl
Advanced Perl Regular Expressions – Part 9
Foreword: In this part of the series I talk about embedding comments and modifiers in a regex.
By: Chrysanthus Date Published: 2 Apr 2016
Introduction
The syntax to embed anything in a regex is
(?char)
where char is a character that indicates what is embedded. After char, you can optionally have some datum.
Note: in this article, if you cannot see any text or piece of code or if you think something (e.g. an image) is missing or link does not operate, just contact me at forchatrans@yahoo.com .
Comments
Just as you can comment when writing ordinary code, you can comment within a regex, but you need to learn how to do this. A regex can have more than one comment. There are two ways of embedding a comment into a regex. You can use the above syntax or you can use the x modifier. With either ways, you can type comment next to a sub-pattern. A regex can consist of several sub-patterns.
Comments Using the Embedding Syntax
Using the above syntax, in this case you have,
(?#text)
where ? means embedding, # means comment and text is the actual comment as text. Note that the embedding structure begins with parenthesis and ends with parenthesis. So there should be no closing parenthesis within the text as that will conflict with the terminating parenthesis of the embedded structure. You can type the comment before a sub-pattern, as in,
/(?# on the head)[hc]a[tp]/
You can type the comment after a sub-pattern as in,
/[hc]a[tp](?# on the head)/
The comment group does not match anything in the subject string, so the comment group can be broken down into more than one line by pressing the Enter key as in,
/[hc]a[tp](?# on
the head)/
Note: with this syntax you cannot break the pattern (code) that matches, into lines by pressing the Enter key
Read and try the following code:
use strict;
if ("A hat and a cap" =~ /(?# on the head)[hc]a[tp]/)
{
print "Matched";
}
This style of commenting has been largely superseded by the raw, freeform commenting that is allowed with the //x modifier.
With the x modifier you still have the comments embedded but not with the embedding syntax. The x modifier is at the end of the complete regex. In this case, you can optionally type the complete pattern as sub-patterns with the sub-patterns in different lines by pressing the Enter key. Next (on the right) to each sub-pattern you can type a comment beginning with #. With this syntax, the comment beginning with # has to be on one line, in order not to conflict with a sub-pattern
use strict;
if ("A hat and a cap" =~ /#Talking about the head!
# Yes talking about it (head).
[hc] # A sub pattern
a
[tp] #comment on the right
/x
)
{
print "Matched";
}
I prefer to comment using the //x modifier.
Embedding Modifiers
Another way to mention modifiers is, //i , //m , //s and //x instead of just i, m, s, and x. These particular modifiers can be embedded in a regex using the embedding syntax, but there is no optional datum. I use the //i to make matching independent of casing to illustrate the embedding of modifiers. The syntax for embedding the //i modifier is:
(?i)
If you place this modifier at the beginning of a regex (just after the first forward slash), it is the same as placing at the end and the whole regex becomes case insensitive. So,
/(?i)Augustine/
is the same as,
/Augustine/i
You should not use the embedded modifier and the same modifier at the end of the regex.
Now, if you embed the modifier within the regex, it acts from the point of embedding to the end of the regex. So,
/Augus(?i)tine/
will match the subject string, "AugusTINE".
/Au(?i)gus(?-i)tine/
will match "AuGUStine" but will not match "AuGUSTINE".
You can have a composite embedded modifier, by just having more than one modifier in the embedded modifier brackets, as in
(?si)
Read and try the following script:
use strict;
my @arr = "I am Augustine, You are AuGUStine. He is not AuGUSTINE" =~
/Au(?i)gus(?-i)tine/g;
foreach my $var (@arr)
{
print $var, "\n";
}
The output is:
Augustine
AuGUStine
The third “Augustine” in the subject did not match; that is justified.
That is it for this part of the series. We stop here and continue in the next part.
Chrys
Related Links
Perl BasicsPerl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course
BACK NEXT