[Image of Linux]
The Personal Web Pages of Chris X. Edwards

Regular Expression Tutorial

--------------------------

Comprehensive Quantifier

Perhaps the biggest problem people have when they try to learn how to use regular expressions is that they are used to metacharacters from other systems. The worst offender is the asterisk, "*". In many shells, the "*" is an important feature in filename expansion. In DOS one can type "dir *.*" to see a listing of all files in the directory. The most important thing you must do to understand regular expressions is to forget about this. Just because the "*" is a wildcard character in other contexts, it does not do that in regular expressions. Get over it. (Technically in regular expressions, the '*' is called the "Kleene star" after mathematician Stephen Kleene who did much to formalize an algebra for regular expressions.)

What the "*" does do is more bizarre than its typical use with shell syntax. In regular expressions, the "*" is one of many metacharacters that act as a quantifier. In DOS and shell languages, the "*" is like a pronoun which fills in for something. In regular expressions, it is an adjective which describes some other character. This is an important point - the "*" almost never appears alone. (When it does, it must be the first (or only) character of the regular expression, and it will then be a literal "*" character.)

The "*" is the comprehensive quantifier because it specifies that any quantity of a particular character is acceptable. The regular expression is composed by putting the character first and then the quantifier. This regular expression will match all numbers that are a power of 10.

re-10star

This will find 1, 10, 100, 1000, 100000000000, etc. In this expression, the "*" modifies the literal character "0". The "1" must always be present, but the literal "0" is further modified by the quantifier which says that it can appear zero or more times. This effectively makes the "0" character optional in the search.

This regular expression will find clothing sizes such as "Size M", "Size XXL", etc.

re-Size_XLMS

Note that it will also find things like "Size XXXXXXM". What's an extra medium?

--------------------------
Previous Home Next
This page was created with only free, open-source, publicly licensed software.
This page was designed to be viewed with any browser on any system.
Chris X. Edwards ~ December 2003