[Image of Linux]

Perl CGI Security Notes by Chris

Projects of
Chris X Edwards
--------------------------

Objective

After hearing how some relatively simplistic attacks against Perl CGI programs have caused trouble at my institution, I started to worry that I may have been putting a lot of naive code out there. I wanted to learn the fundamental mechanisms that can be used when exploiting a Perl CGI program and do my best to limit the liability of my software. There is a lot of information on this topic, but it is not very concise or centralized. This document attempts to be a collection of issues that Perl CGI programmers should be aware of. Unlike many documents with this kind of information, my perspective is that of the programmer trying to be defensive, not of the black hat trying to be naughty. If I have made any mistakes or overlooked other obvious threats, please feel free to contact me.

Play Along At Home

If you really want to get a feel for Perl CGI vulnerabilities, then you really should try some out. However, poking around your favorite on-line merchant's shopping cart is likely to get you a visit from the FBI. Fortunately, I have written a nice collection of programs designed to be vulnerable to various exploits. I recommend running these on a closed system. You'll need a working copy of apache, Perl, and the CGI.pm. I found that a Mac OS X laptop works great and can be self-contained. The scripts I have prepared are very, very simple and will be easier to deal with than writing your own, but, hey, do that too! Let me know what you come up with.

Trust Nothing

The Internet is a scary place. If you are planning to offer some software service to the general public over the Internet, you must increase your awareness of what the web user has control over and the ramifications of this. The first thing to understand is that the CGI interface using HTML forms is merely a courtesy to well-intentioned users. An attacker using your script as a point of weakness will often bypass the HTML form completely. Perhaps you put a maximum limit on a text field's length. If you say:
<input type=text name=protein size=12>
in your form, you might think that you wouldn't have to worry about the value of param('protein') being longer than that. Wrong! By specifying the variable in the URL, the naughty user bypasses this restriction.
http://www.xed.ch/cgi-bin/pnb.pl?protein=#!/bin/bash;#A%20complete%20nasty%20script%20follows...

The naughty user can supply your script with variables that you never mentioned. Imagine if your script automatically does something with all variables it receives (like write a file named after the variable) under the assumption that all of them are sanctioned by the program. This is a very fragile system that is easy to abuse.

The user is not constrained by the contents of a selection list. Just because those were the only options you wanted to make available to your program doesn't mean those are the only ones the attacker can use.

Users can also change the values of "hidden" form elements. It may even come as a surprise to some beginners to learn that hidden form elements are in no way effectively hidden from the user. These variables are simply undisplayed by default. They can be viewed and even sent back with different values.

Lesson

Character Reference

Often novices find the good advice to steer clear of cartoon expletives. In otherwords, don't use any #@^!$~ special characters for any reason. But you still find people who have been warned against this kind of thing still naming files like this: Prices\ in\ $.xls

I avoid these characters like the plague. Maybe more programmers and users would too if they understood just what mischief each and every one could do. So here's a big but not comprehensive list describing many bad characters and the potential problems associated with them. At the very least, it's helpful to know what's at the top of the list to defend against if you must use a special character. This list is oriented towards the character's behavior in bash, although a lot of these issues extend by conventions to other systems and environments. What's worse is that in other environments, these characters may have yet more obscure and dangerous problems to contend with.
`Command SubstitutionAllows arbitrary commands to be executed.
$Command SubstitutionVariable substitution.
( )Command SubstitutionEquivalent to backtics $(command).
;Command SeparatorAllows attacker to write entire scripts in an input form box.
|PipeRedirects Output to a new arbitrary process (which doesn't even have to care about the stdin).
/Directory SeparatorHelps attacker escape from safe directory.
\Character EscapeEvades bad char checks: echo -e "bogus\x7crm" produces "bogus|rm".
{ }Parameter Expansion, Brace ExpansionLots of obscure functionality. Allows attacker to try several things at once.
&Redirecting File Descriptors, Job ControlChanging stdout, stderr, etc.
'Protective QuoteUsed to hide special shell characters for later execution.
"Mild QuotingAllows for filenames/strings like "rm -r" (space included).
*Pathname ExpansionFind out or delete the contents of a directory.
?Pathname ExpansionAllows attacker to know less about the attacked system.
<Redirect InputChange source of input, create heredocs and herestrings. Used to read senstive files.
>Redirect OutputAllows messing with filesystem, e.g. overwriting/appending critical files.
[ ]Conditional Expressions, Pathname ExpansionAllows fishing for the existence of various files.
!History SubstitutionAllows picking through command history file. #! reveals current command script uses.
^History SubstitutionRelatively inert.
~Tilde ExpansionAccess home directories of accounts.
-Option SpecificationAlthough a common and reasonable character in filenames, can cause option confusion.
#Comment, History SubstitutionHide things to get past shell. Also has about 5 other minor functions in bash.
\nNewlinesNew lines where not expected are bad.
\rCarriage ReturnsCan be confusing too.
spaceToken SeparatorCan mess up command lines. Cause action on multiple unitended items.
Ctl CharsEmbedded Control CharactersCause obfuscated mischief.
Lesson

Traversal Vulnerabilities

Where data is stored on the filesystem must be given some thought. Perhaps you have a directory where untrusted users can access public files through your CGI scripts. Here is a very simplified example of a program that allows a web user to retrieve the contents of a file in a safe directory.
$PATH="/tmp/ok";
$n=param('name');
open (FILE, "<$PATH$n");
print while ();
close (FILE);
The problem here is that the CGI user also can read a file from any other directory that the web user can (often root, a whole different bad problem). Even though a path is hard coded in the script, the CGI user needs only to enter something like this in the form that asks for the file name:
../../etc/passwd
Also don't forget that the attacker can easily read your HTML source, so a JavaScript validation isn't going to work since the attacker can just send the sneaky request in the URL:
http://www.xed.ch?name=../../etc/passwd
Lesson

"open()" Vulnerabilities

The open command in Perl is very naughty. Since Perl's creators went out of their way to make this command as easy to use as possible, it is easy for attackers to use too. The worst problem comes from the default behavior which is to open a file for reading. Here is a simple usage:

$n=param('name');
open (FH, $n);
#.... It doesn't matter what's here, it's too late.
close (FH);
You might think that this allows the user to specify a name and if it exists, it will be opened. That's true; the program will work that way. But there is much more functionality hidden in there. Basically this construction allows an outside user to do pretty much anything to the filesystem (that the web user can do).

Many people don't realize that the open command supports very powerful pipes. Think of it like this: when a file is opened for reading or writing, data is prepared to be taken from or sent to that file. When a pipe is opened for reading or writing, data is prepared to be taken from or sent to some arbitary process. So for example, here is a legitimate use of a pipe:

open (PIPE, "/usr/bin/cal|");
print while (<PIPE>);
close (PIPE);
This bit of code would give Perl access to the data produced by the cal (calendar) program. This data isn't in a static file (it is time sensitive), but it can be used as if it were. Data can be sent to pipes too. Pipes are powerful and can get complex, but all that one needs to know is that by using a pipe, the open command provides the capability to run any arbitrary command. In the previous example, it doesn't present a problem since the programer has complete control over what is being run. But be careful when the file handle being opened involves any input from the user.

Looking at the first open example again, imagine the user inputs something like this:

http://www.xed.ch?name=/sbin/shutdown%20-h|
The data the user will be reading at that point is "The system will be shut down immediately! [etc]". That's probably not what the programmer had intended.

One of the best defenses against this problem is to never use open without specifying exactly what kind of operation you intend. The open command above should be specified:

open (FH, "<$n");
Now it is clear that this is for reading. It's still not a bad idea to exclude pipes and other naughty characters from $n.

Before you completely relax and feel comfortable that this problem is easy to avoid, be warned that even with the explicit read character, <, a very, very clever user might still be able to make mischief. It turns out that Perl has so much functionality packed into the open command that one can open file descriptors and the syntax starts with <. Therefore, if the user supplies a value for $n of "&=3", file descriptor 3 gets opened for input. This is an unlikely exploit, but it reminds us that unexpected functionality lurks everywhere and people who specialize in exploiting that will know what to do with it better than non specialists.

Lesson

"system" vs. "exec" vs. "fork" vs. "qx{}" vs. "`cmd`"

Because Perl was designed to do everything that shell scripting could do, it has no shortage of ways you can mess with the host system. Perl is often called a glue language for its ability to bind other programs into a coherent system. This is useful for CGI programmers who want to put a web interface on some non Perl program. How should Perl call these underlying programs? Carefully.

The most important thing to remember is that if the user has any control over what Perl will be running (a user specified option or argument, etc) then extreme care must be taken to avoid allowing the user to slip in some magic that allows unintended processes to commence. Basically all of the bad characters are suspect when using Perl to start other processes.

As for which form of process spawner to use, there are many options. Here are the official descriptions of the ones I know about. I personally use the new qx{ } style since it seems intended to handle the normal cases where the programmer wants to run an external program.

fork Does a fork(2) system call to create a new process running the same program at the same point.

exec The "exec" function executes a system command and never returns-- use "system" instead of "exec" if you want it to return. It fails and returns false only if the command does not exist and it is executed directly instead of via your system's command shell (see below).

system Does exactly the same thing as "exec", except that a fork is done first, and the parent process waits for the child process to complete. Note that argument processing varies depending on the number of arguments.

qx{} A string which is (possibly) interpolated and then executed as a system command with "/bin/sh" or its equivalent. Shell wildcards, pipes, and redirections will be honored. The collected standard output of the command is returned; standard error is unaffected.

`cmd` Older syntax for qx{} based on shell command substitution.

With any of these commands, it's best to assume that PATH variables have been tampered with. Even if your program didn't cause that vulnerability, once an attacker has altered the default PATH, your program can be made to do bad things. Use explicit absolute addresses for each command you execute and do not rely on an automatic search through the PATH to find things. Also use Perl builtin commands where possible as opposed to Unix shell commands (grep and unlink are good examples).

# Not very secure:
print qx{grep $find $file};

# A tiny bit better:
print qx{/bin/grep $find $file};

# Best to avoid the shell completely. Much better:
open(FH,'<',$file);
/$find/ and print while <FH>;
close FH;

Lesson

Taint Mode

Since enough people have been burned by Perl's tendency to be as helpful to bad guys as it is to programmers, the developers have created a special mode to help protect systems from mischief. This mode is called the taint mode and is activated by starting perl like this:
#!/usr/bin/perl -T

When taint mode is in effect, all data that originates from some external source is restricted so that it can not be used to affect anything else outside your program. For example, if you read in some user input into a variable, that variable can not be used as the name of a filename or command. It can be used in a print statement, however, since printing itself is assumed to be fairly safe from unintended consequences.

To sanitize some externally contributed data, you must employ backreferences from a regular expression. The concept is that if you spent enough effort to program a regular expression to condition this data, then you probably filtered out anything bad. Of course, you can make a mistake in this stage, but at least Perl isn't going to let some subtle and forgotten piece of data get by without some consideration.

#!/usr/bin/perl -T
$n=param('name');              # $n is tainted
$n =~ m/^([a-zA-Z1-9._]+)$/;   # $n is still tainted,...
open (FILE, "<", $1 );      #     ... but the back reference (in parens) is not
print while ();
close (FILE);

Taint Trivia: Data can be untainted by making it a key to a hash. Taint limits the contents of the default path list.

Lesson

Poison Null Byte

Imagine that we have a situation where we have some sensitive data (like a password or key file) which lives in the same directory as user data. We want to be able to let the user specify any file but the restricted one and then use that as a parameter to a C program that does something special with it.
$n=param('name'); 
die if ( $n eq "restricted.data" ); # Can't let the webuser use this file!
qx{/usr/local/bin/specialCprogram --data-file $n};
This program demonstrates a very specific concept, so ignore the otherwise lax security. Imagine now that the user provided a url like this:
http://www.xed.ch/cgi-bin/pnb.pl?name=restricted.data%00Noproblemshere
When Perl gets this, it is definitely not the sensitive filename. Perl can deal with the Null Character (character zero) as if it were any other. On the other hand, C/C++ uses this character as a signal that it has arrived at the end of a string. So when the C program parses the input, it will see "restricted.data" and then the null byte signal and it will think the data is finished. The file it opens up and processes would be "restricted.data". Doh!

You can clean up any input that might suffer from this problem by doing this:

$insecure =~ s/\0//;
or, this is faster (and fatal):
die if $insecure =~ tr/\0//;
Lesson --------------------------

Chris X. Edwards ~ January 2005