Regular Expressions Metacharacters in MySQL
Regular Expressions in MySQL - Part 2
Foreword: In this part of the series we look at the uses of metacharacters in MySQL Regular Expressions.
By: Chrysanthus Date Published: 28 Aug 2015
Introduction
Database table for Illustration
We shall use the following database table called Pets, for illustration:
| name | owner | species | sex | birth | death |
+--------+--------+---------+------+------------+------------+
| Buffy | Harold | dog | f | 1999-05-13 | NULL |
| Bowser | Diane | dog | m | 1999-08-31 | 2005-07-29 |
| Fluffy | Harold Taylor | cat | f | 2003-02-04 | NULL |
| Claws | Gwen Jones | cat | m | 2004-03-17 | NULL |
| Whistler | Gwen | bird | NULL | 2007-12-09 | NULL |
You should test the SELECT statements given below with this table.
The ^ Character
The ^ character is used to match the beginning of the subject. The following SELECT statement will return the first and second rows, matching “Buffy” and “Bowser” in the first and second cells of the first column:
SELECT * from Pets where name rLike "^B";
The pattern (regex) is "^B". In the subjects, “Buffy” and “Bowser”, ‘B’ is at the beginning. Note the position of ^ (at the beginning) in the pattern.
The $ Character
The $ character is used to match the end of the subject. The following SELECT statement will return the first and third rows, matching “Buffy” and “Fluffy” in the first and third cells of the first column:
SELECT * from Pets where name rLike "fy$";
The pattern is "fy$". In the subjects, “Buffy” and “Fluffy”, ‘fy’ is at the end. Note the position of $ (at the end) in the pattern.
The dot character is used to match any character in the subject. It would also match the carriage return or the newline character. The following SELECT statement will return the fourth and fifth rows, matching “Gwen Jones” and “Gwen” in the fourth and fifth cells of the second column:
SELECT * from Pets where owner rLike "Gw.n";
The pattern is "Gw.n". In the subjects, “Gwen Jones” and “Gwen”, ‘e’ corresponds to the dot in the pattern. So, any subject that has Gwen or Gwan or Gwsn or Gwin, etc is matched. In these words, the dot corresponds to ‘e’ or ‘a’ or ‘s’ or ‘i’ respectively. The dot corresponds to any character, in a particular position, in a pattern.
The subject, “Gwen Jones” has been matched because the statement looks for the word, "Gw.n" within the subject that might be a long string. Once the statement sees the pattern within a subject (string), no matter how long the subject is, it matches it.
The x* Construct
Here, ‘x’ is a variable (place holder) for any character. The * just after ‘x’ means, match any sequence (consecutive) of zero or more ‘x’ characters. The pattern, "se*n" for example, would match “seen”; it would match “sen”; and it would match “sn”. That is, “e*” would match zero or more ‘e’ in sequence. The following SELECT statement will return the first and third rows, matching “Buffy” and “Fluffy” in the first and third cells of the first column:
SELECT * from Pets where name rLike ".*f*y";
The pattern is ".*f*y" . In the subjects, “Buffy” and “Fluffy”, “.*” corresponds to “Bu” and “Flu” in “Buffy” and “Fluffy” respectively. “.*” means match any character, zero or more times. Still in the subjects, “Buffy” and “Fluffy”, “f*y” corresponds to “ffy” in “Buffy” and “Fluffy”. “f*” means match ‘f’ zero or more times.
Here, ‘x’ is a variable for any character. The + just after ‘x’ means match any sequence (consecutive) of one or more ‘x’ characters; as opposed to * which matches zero or more characters. The pattern, "se+n" , for example, would match “seen”; it would match “sen”; but it would not match “sn”, which does not have at least one ‘e’. The following SELECT statement will return the fifth row, matching “bird” in the fifth cell of the third column:
SELECT * from Pets where species rLike "b.+d";
The pattern is "b.+d" . In the subject, “bird”, “ir” are any two characters in sequence matched by “.+”. “.+” matches any character (not necessarily the same character) that occurs in sequence one or more times. If you want a particular character in sequence, then type the character in place of the dot, as in “e+” against “seen”.
The x? Construct
Here, ‘x’ is a variable for any character. The ? just after ‘x’ means match either zero or one ‘x’ character. The following SELECT statement will return the third, fourth and fifth rows, matching “Fluffy”, “Claws”, and “Whistler” in the third, fourth and fifth cells of the first column:
SELECT * from Pets where name rLike "l";
The pattern is "l". Each of the subjects, “Fluffy”, “Claws”, and “Whistler” has ‘l’. So matching occurs. Remember, once a pattern is found in a subject, matching occurs. Now the following SELECT statement will select all the five rows of the table, matching the first, second, third, fourth and fifth cells of the first column:
SELECT * from Pets where species rLike "l?";
The pattern is "l?". The subjects are, “Buffy”, “Bowser”, “Fluffy”, “Claws” and “Whistler”. Each of these subjects has zero or one ‘l’; and so each is matched and so all five rows are selected.
The | Character
The | character means OR. It means match either what is on the left of | or what is on the right. The following SELECT statement will select the rows, whose cell in the second column has either “Harold” or “Diane”.
SELECT * from Pets where owner rLike "Harold|Diane";
The pattern is "Harold|Diane". Note that the statement returns 3 rows. The three subjects are: “Harold”, “Diane” and “Harold Taylor” found in the cells of the first 3 rows, in the second column. Each of the subjects has “Harold” or “Diane”. The third subject has more text than just “Harold”, but it still has “Harold” or “Diane”.
The alternative words (characters) may exist only in one subject or in different subjects. Let us look at another example. The following SELECT statement will select the rows, whose cell of the second column has either “Gwen” or “Jones”.
SELECT * from Pets where owner rLike "Gwen|Jones";
The pattern is "Gwen|Jones". The fourth and fifth rows were selected. The subjects are, “Gwen Jones” and “Gwen”. Note that “Gwen” is found in either of the subjects and it is enough for the two subjects to be selected. “Jones” is found only in “Gwen Jones” and it is enough for “Gwen Jones” to be selected. If the fifth row did not exist, only the fourth row with “Gwen Jones” would have been selected; that is, “Gwen” or “Jones” would have selected “Gwen Jones”. So the | (OR) character means either words can be chosen from the same subject or different subjects.
You have seen MySQL regular expressions metacharacters and simple pattern constructs in this part of the series Let us stop here and continue in the next part with more pattern constructs.
Chrys
Related Links
Implementing Database in MySQLProgramming in MySQL
Backup Basics in MySQL
MySQL Access Privileges
Regular Expressions in MySQL
Date and Time in MySQL
Event in MySQL
MySQL Transaction
PurePerl MySQL API Prepared Statements
More Related Links
PurePerl MySQL Command Line Tool
Major in Website Design
Perl Course - Optimized
Web Development Course
BACK NEXT