Perl Functions Related to Regular Expressions
Perl Regular Expressions – Part 8
Perl Course
Foreword: In this part of the series, I talk about Perl functions that are related to regular expression features.
By: Chrysanthus Date Published: 6 Oct 2015
Introduction
ucfirst
This function titlecases the first alphabet of a string. That is, it changes the first alphabet of the string to uppercase. If the letter was already in uppercase, it remains unchanged. The function returns a new changed string, leaving the original string unchanged. Try the following code:
use strict;
my $subject = "the big boys are in town.";
my $newSub = ucfirst($subject);
print $newSub;
The output is:
The big boys are in town.
uc
This function changes all alphabets in the string to uppercase. It returns a new string leaving the original string unchanged. Try the following code:
use strict;
my $subject = "the big boys are in town.";
my $newSub = uc($subject);
print $newSub;
The output is:
THE BIG BOYS ARE IN TOWN.
lcfirst
This function changes the first alphabet of a string to lowercase. If the alphabet was already in lowercase, it remains. The function returns a new string. Try the following code:
use strict;
my $subject = "THE BIG BOYS ARE IN TOWN.";
my $newSub = lcfirst($subject);
print $newSub;
The output is:
tHE BIG BOYS ARE IN TOWN.
This function changes all the alphabets of a string to lowercase. The function returns a new string. Try the following code:
use strict;
my $subject = "THE BIG BOYS ARE IN TOWN.";
my $newSub = lc($subject);
print $newSub;
The output is:
the big boys are in town.
pos
This function returns or sets the current match position.
After a match, the pos() function can be used to return the next position that the searching in the subject is to begin, for the next match. This works with the global modifier.
In the above case, after the first match of “cat”, pos() would return 5. Counting position in string begins from zero. A good way to use the pos() function is in a while loop. The following code illustrates this:
use strict;
my $subject = "A cat is an animal. A rat is an animal. A bat is a creature.";
while($subject =~ /[cbr]at/g)
{
print "Next search starts at position: ", pos($subject), "\n";
}
Here is the output of the code:
Next search starts at position: 5
Next search starts at position: 25
Next search starts at position: 45
The pos() function takes as argument the variable of the subject. The pos() function can also be used to set the position where search will continue, in the subject – see later.
quotemeta
This function is used on the pattern string itself. It escapes (precede by backslash) all ASCII non-word characters in the pattern and returns a new pattern string. ASCII word characters is the set, [A-Za-z_0-9] . Consider the following regex:
/http://www/
Assume that this is to match the beginning of a Uniform Resource Locator. The pattern is
http://www
In the pattern, : and / are non-ASCII word characters. If quotemeta() is used on the pattern, the pattern will become, http\:/\/\www . The following code illustrates this:
use strict;
my $pat = "http://www";
my $pattern = quotemeta($pat);
"http://www.somesite.com" =~ /($pattern)/;
print $pattern, "\n";
print $1, "\n";
The output is:
http\:\/\/www
http://www
This function uses a regex to separate a string into parts. The syntax to use the function is:
split /pattern/, string
The split function splits a string into a list of sub strings and returns the list. The pattern is the separator e.g. a comma. The separator should not be part of the returned list. Consider the following subject string:
$subject = "one two three";
If we know the regex pattern to identify space between words, then we can split this string into a list made up of the words, “one”, “two” and “three”. This list can be received by an array. or \ is the character for a space. + will match a space one or more times. The regex to separate the above words is:
/ +/
We assume that space might be created by hitting the spacebar more than once. The following code illustrates the use of the split operator with this pattern.
use strict;
my $subject = "one two three";
my @words = split / +/, $subject;
print "First Element is: ", $words[0], "\n";
print "Second Element is: ", $words[1], "\n";
print "Third Element is: ", $words[2], "\n";
In the subject, the words are separated by spaces. The output of the code is:
First Element is: one
Second Element is: two
Third Element is: three
The spilt() function has split the words in the subject using the space between the words, and put each word as element in the array.
It is possible to have words in a string separated by a comma and a space, like
my $subject = "one, two, three";
The regex to separate these words is:
/, +/
The following code illustrates this:
use strict;
my $subject = "one, two, three";
my @words = split /, +/, $subject;
print "First Element is: ", $words[0], "\n";
print "Second Element is: ", $words[1], "\n";
print "Third Element is: ", $words[2], "\n";
The output of the above code is:
First Element is: one
Second Element is: two
Third Element is: three
my $subject = "/dir1/dir2";
The subject is a path to a directory.
We can use the following regex to split the string:
/(\/)/
The forward slash in the pattern is escaped and is in a group. The following code illustrates the use:
use strict;
my $subject = "/dir1/dir2";
my @words = split /(\/)/, $subject;
print "First Element is: ", $words[0], "\n";
print "Second Element is: ", $words[1], "\n";
print "Third Element is: ", $words[2], "\n";
print "Fourth Element is: ", $words[3], "\n";
print "Fifth Element is: ", $words[4], "\n";
The output of the code is:
First Element is:
Second Element is: /
Third Element is: dir1
Fourth Element is: /
Fifth Element is: dir2
Now, this code and its output needs explanation because of what we have as the value of the first array element. We said above that if the regex has groupings, then the list produced contains the matched sub strings from the groupings as well. The array receives the words and the matched sub strings for the group. Now, note that the separator begins the subject. So the split operator separates the beginning of the subject, which is nothing, from the first character of the subject. It returns nothing as its first separated value.
use strict;
my $subject = "/dir1/dir2";
my @words = split /(?:\/)/, $subject;
print "First Element is: ", $words[0], "\n";
print "Second Element is: ", $words[1], "\n";
print "Third Element is: ", $words[2], "\n";
print "Fourth Element is: ", $words[3], "\n";
print "Fifth Element is: ", $words[4], "\n";
The output now is:
First Element is:
Second Element is:
Third Element is: dir1
Fourth Element is:
Fifth Element is: dir2
Now, undef has been placed in the array instead of the captured group. undef is not printed.
Can the array be shrunk by removing the undef elements? – Yes: use the following code:
use strict;
my $subject = "/dir1/dir2";
my @words = split /(?:\/)/, $subject;
foreach my $i (0..$#words)
{
splice @words, $i, 1 if $words[$i] eq undef;
}
print "First Element is: ", $words[0], "\n";
print "Second Element is: ", $words[1], "\n";
The output is:
First Element is: dir1
Second Element is: dir2
The splice function removes the undef elements.
Consider the following subject string:
my $subject = "http://www.somewebsite.com/dir1/dir2/file.htm";
This is a URL. Let us split this URL into its components, that is, “http:”, “www.somewebsite.com”, “dir1”, “dir2” and “file.htm”. The separator here is either a forward slash or a double forward slash. The pattern for this separator is:
/\/{1,2}/
The pattern wants between one or two forward slashes. This will satisfy the single or double slashes. There is no need to use a group (captured or non-captured) in the pattern. The separator will only be included in the list returned, if the pattern has a group. The following code illustrates this:
use strict;
my $subject = "http://www.somewebsite.com/dir1/dir2/file.htm";
my @words = split /\/{1,2}/, $subject;
print "First Element is: ", $words[0], "\n";
print "Second Element is: ", $words[1], "\n";
print "Third Element is: ", $words[2], "\n";
print "Fourth Element is: ", $words[3], "\n";
print "Fifth Element is: ", $words[4], "\n";
So “http:” becomes the first array element, “www.somewebsite.com”, becomes the second array element, “dir1” becomes the third array element, “dir2” becomes the fourth array element and “file.htm” becomes the fifth array element.
The output is:
First Element is: http:
Second Element is: www.somewebsite.com
Third Element is: dir1
Fourth Element is: dir2
Fifth Element is: file.htm
That is it for this part of the series.
Chrys
Related Links
Perl BasicsPerl Data Types
Perl Syntax
Perl References Optimized
Handling Files and Directories in Perl
Perl Function
Perl Package
Perl Object Oriented Programming
Perl Regular Expressions
Perl Operators
Perl Core Number Basics and Testing
Commonly Used Perl Predefined Functions
Line Oriented Operator and Here-doc
Handling Strings in Perl
Using Perl Arrays
Using Perl Hashes
Perl Multi-Dimensional Array
Date and Time in Perl
Perl Scoping
Namespace in Perl
Perl Eval Function
Writing a Perl Command Line Tool
Perl Insecurities and Prevention
Sending Email with Perl
Advanced Course
Miscellaneous Features in Perl
Perl Two-Dimensional Structures
Advanced Perl Regular Expressions
Designing and Using a Perl Module
More Related Links
Perl Mailsend
PurePerl MySQL API
Perl Course - Professional and Advanced
Major in Website Design
Web Development Course
Producing a Pure Perl Library
MySQL Course
BACK