[Image of Linux]
The Personal Web Pages of Chris X. Edwards

Regular Expression Tutorial

--------------------------

DELIMITERS

A delimter is something that marks significant positions in communications. A period in a normal sentence indicates that the sentence has ended. The period is a delimiter. When trying to get a computer to do what you want, it is very important to be very explicit about how you define delimiters. Where relevant data begins and ends is always a challenge to define.


In news crawls on CNN, the delimiter is the red "CNN" and it lets the reader know that there is a new topic being addressed.

Here is a very common computer problem expressed in a nontechnical scenario:

Navy Captain:
Attention unregistered vessel, our gunship can either sink you or send a small boat to take you prisoner OVER
Pirate:
Bring it on over and we can discuss our surrender OVER

What is the problem here? The sailors are using the word "OVER" as a delimiter to indicate when they are giving control of the communications channel to each other. The pirate accidentally uses the word "over" in his actual message. Because "over" is the delimiter, the navy captain ignores that word and everything after it. To the capitain, it seems like the pirate said, "Bring it on!".

When using a regular expression, a delimiter must be used to indicate where the various components begin and end. Without a pre-established system, the regular expression interpreter would never know when it was finished reading.

The most common regular expression delimiter is the forward slash "/". If you use Unix, a similar operating system, or any of the common markup languages such as HTML or XML, you'll quickly run into problems where you want to find text in the form of file path names or tags. So if you have a listing of all the files on a disk and you want to find all of the files that contain the text "/project/files/doc/" there would be confusion with the forward slash.

In some software such as graphical text editors, regular expressions are entered into a form box and there is no need for special delimiters. But the usual format for text based applications is:

/searchterm/

This is the most basic way to specify a regular expression. The first slash is where the regular expression starts and the last slash tells where it ends. Actually, there's more to it. The first slash is not so much to indicate where the expression begins. Most software usually knows where the regular expression begins because of syntax rules. But what you are searching for can be anything and the program can't guess where that ends. The true purpose of the first delimiter is to define what the end delimiter will be. This allows the user to use a different character like this:

#/project/files/doc/#

In this regular expression, the search term is "/project/files/doc/". The first # indicates that the delimiter shall be that character and the second # indicates that the search term is complete. By choosing a character that is unlikely or, better yet, impossible to appear in the text, things become less confusing.

--------------------------
Previous Home Next
This page was created with only free, open-source, publicly licensed software.
This page was designed to be viewed with any browser on any system.
Chris X. Edwards ~ December 2003