[Image of Linux]
The Personal Web Pages of Chris X. Edwards

Regular Expression Tutorial

--------------------------

Custom Character Classes

The idea of a character class is to specify the kinds of characters to be searched for. Sometimes you want to find certain kinds of characters, and sometimes you want to find other kinds. Regular expressions include powerful features to very explicitly define the exact character class you are interested in.

The dot character class is most useful when you have no idea what you are going to find, but by specifying custom character classes explicitly, you can find very specific patterns. Specific character classes are created by including the desired characters between square brackets.

Some examples of simple character classes:

vowels-
re-aeiouAEIOU
characters that extend below the baseline-
re-gjpqy
dotted and crossed characters-
re-fijtxQX
some characters that often look the same upside down-
re-NOoIxXSszZH
top 10 most common letters in English-
re-etaoinshrd..

It is important to understand that the character class specified by brackets represents a multiple choice of acceptable matching characters. Therefore, [ABC] represents the ability to match an "A" or a "B" or a "C". It doesn't mean that A, B, and C must be present. The entire character class is focused only on one character.

Because a character class is focused on matching individual characters, it is important to realize that the general syntax allows for a completely arbitrary order. This means that the character class [ABC] is exactly the same as [ACB], [BAC], [BCA], [CAB], and [CBA].

re-ABCequalsACBetc

Another fact about character classes that makes sense when you think about it is that there should never be repeated characters in the specifcation. So in [ABBC] the second B is completely pointless. This is like answering the question "What foods do you like?" with, "Apples, brocoli, brocoli, carrots." Most regular expression interpreters can ignore this kind of redundant duplication, but it's best to avoid it if possible since it can only make your search take longer.

--------------------------
Previous Home Next
This page was created with only free, open-source, publicly licensed software.
This page was designed to be viewed with any browser on any system.
Chris X. Edwards ~ December 2003