Non Capturing Groups and better Splits in Perl
Advanced Perl Regular Expressions – Part 6
Foreword: In this part of the series, I explain how to code a non-capturing group in Perl.
By: Chrysanthus Date Published: 2 Apr 2016
Introduction
Reason for Non-Capturing Group
Must every group (with parentheses) in Perl regex be captured? - No. As you type a regex you might want a group for convenience. Such a group is evaluated faster than a capturing group. The syntax for a non-capturing group is:
(?:pattern)
The ? means you have some embedded information. The : should be typed after ? . The pattern is of your choice.
Example of Non-Capturing Group
In the following script, the group (one) is not captured while the group (two) is captured. Read and try it:
use strict;
my @arr = ("This is one and that is two." =~ /(?:one).*(two)/);
for (my $i=0;$i<@arr;++$i)
{
print $arr[$i], "\n";
}
The output is just “two” meaning that the group, (one) typed as (?:one) was not captured as it is a non-capturing group.
There is a function called the split() function or split operator. The syntax is:
split /pattern/, string
The split operator splits a string into a list of sub strings and returns the list. The pattern is the separator e.g. a comma. The separator should not be part of the returned list. You can place parentheses around both arguments. The return object of the split function is a list (array) of the different sub-strings. Consider the following subject:
my $subject = "one, two, three";
You may want to separate this into the sub-strings, “one”, “two” and “three”. The separator is /, /, that is, comma and a space. The following code does the split but there is some redundancy;
use strict;
my $subject = "one, two, three";
my @words = split(/(, )/, $subject);
for (my $i=0;$i<@words;++$i)
{
print $words[$i], "\n";
}
Note that the regex is made up of the capturing group, (, ). The output is:
one
,
two
,
three
The problem here is that because of the capturing group, the separator has also been returned as sub-strings. In the absence of the parentheses the separator will not be returned as sub-strings. However, you can never guarantee that your separator of interest will not have a capturing group.
The solution to the above problem is just to make the group non-capturing. So the following program, which you should read and try, does not have the separator as sub-strings in the returned list (array):
use strict;
my $subject = "one, two, three";
my @words = split(/(?:, )/, $subject);
for (my $i=0;$i<@words;++$i)
{
print $words[$i], "\n";
}
The output is:
one
two
three
Note. if there is no match with the split function, the split function returns the whole string.
That is it for this part of the series. We stop here and continue in the next part.
Chrys
Related Links
Perl BasicsPerl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course
BACK NEXT