[Image of Linux]
The Personal Web Pages of Chris X. Edwards

Regular Expression Tutorial


Character Class Ranges

It is often desirable to search for a character that matches a character in an ordered sequence. For example, the character class [ABCDEFG] specifies a match for anything between (and including) A and G. There is a short cut for specifying such classes called the range metacharacter which is the "-", the dash.

Here are some examples:

lowercase letters-
upercase letters-
all letters-
hexadecimal (base 16) numbers-

Note that ranges work in negated character classes too. A negated range will match anything not in the range.

Here are some negated examples:

non-numbers (letters and symbols)-

It is easy to see how to construct ranges out of characters with a well known order. We know that "G" comes before "T" because we (correctly) assume that the letters are in alphabetical order. A good question is how are the rest of the characters ordered? For that matter, what is the order relationship between upper and lower case? In some cases, this depends a lot on the character encoding you are using. Regular expression work in Japanese and Hebrew as well as English, but the order of characters that ranges specify will be affected. Generally, it's not a good idea to specify ranges that include characters which don't have a natural order. So if you aren't sure if a "#" comes before a "?" then it's probably a bad idea to use them in a range. If this advice seems overly cautious, you can try using this list to order various characters:


For example, [A-z] includes the characters "[", "/", "]", "^", "_", and "`", as well as all of the upper and lower case letters. It is also apparent that the range [a-Z] is an error because "Z" comes before "a".

Previous Home Next
This page was created with only free, open-source, publicly licensed software.
This page was designed to be viewed with any browser on any system.
Chris X. Edwards ~ December 2003