"C is quirky, flawed, and an enormous success."
These notes are not a complete tutorial or reference. They are a useful collection of important topics for someone who has programmed in C but might be rusty. The idea is that the material here can get you started with a programming project quickly.
Contents
-
-
void pointers
-
variable length arrays
-
Useful Resources
-
Interested in my "white style" of C? This simple demonstration parser improves C syntax considerably in my opinion.
-
Why I have idiosyncratic style, using
x= y
butx == y
: The Linux Backdoor Attempt of 2003 -
If you want more comprehensive details about the C language then GNU is a good place to start.
-
The International Obfuscated C Code Contest should mentally prepare you for what you’re about to get into. Oh, The Underhanded C Contest is quite brilliant too!
-
Need to convert C++ into less baroque C?
-
Need to do web framework stuff but you’re not a sissy? Check out Kore.
-
Serious people use C; wonder just how very serious people use C? Here’s JPL’s C programming guidelines for spacecraft.
Compiling
C is a compiled language meaning that it’s source code needs to be translated into something the computer can understand (in its entirety) before it is actually run. The "compiler" does this.
Then typical way to compile a program looks like so:
gcc -o typical typical.c
If you don’t specify a -o
option (for output) your executable
program will be named a.out
which is not terrifically useful. It’s
best to not do too much of that or you’ll have one a.out
overwriting
another.
In the old days compiling C programs on a Linux system was kind of a giant pain. These days things tend to work much smoother but for reference, I’ll include some notes on things to try to solve typical compile issues.
-
Use
-D_GNU_SOURCE
early and often. Modern Linux systems seem to come with a gcc that is aware that it’s a Linux system and does the right thing, but it wasn’t always so. If you want some more serious detail on programming in Linux specific environments, this seems like a good resource. -
If include (.h) files are "lost" try an option like
-I/usr/X11R6/include/X11/magick/
which can provide hints where to find include files. -
Math not working even though you added a
#include <math.h>
? Maybe only some of math (undefined reference to "floor"
)? Try-lm
which often fixes that. I do not understand the logic of this requirement, but sometimes it solves these problems. -
Are you nuts and compiling something against Xlib? You might need something like this:
-L/usr/X11R6/lib -lX11
-
It turns out that the order of your
gcc
options and arguments is important. This nice web page points out that external libraries should be to the right of the thing that calls them. This is why ` gcc square.c -o square -lGL -lGLEW -lglut` works butgcc -lGL -lGLEW -lglut square.c -o square
will produce tons ofsquare.c:(.text+0x49): undefined reference to...
errors. This drove me crazy until I figured it out. A little change in my Makefile and suddenly, everything was wildly broken. -
There is a command called
pkg-config
which can figure out what compiler flags one needs for a particular objective. For example to figure out what is needed for SDL2 trypkg-config --cflags --libs sdl2
. This can be put in make files to increase chances that they’ll work on alien systems.
If you’re curious about the resulting executable, it can be analyzed
with readelf -a myprog
. See
this nice
article on analyzing Linux executables for details and hints.
Also, nm
lists symbols from object files. In fact
GNU Binutils is full of
useful stuff. This can come in handy when trying to sort out gruesome
dependency messes (looking at you Nvidia). Another really good tool is
objdump
— check it’s --help
for the many useful things it can
tell you about library files.
Preprocessor
Including Libraries
#include "file_in_this_directory.h" #include "/an/explicit/path.h" #include <look_in_the_normal_place.h>
There are many useful functions in standard libraries. It looks like Wikipedia has a pretty good list of Posix C libraries. This is a specification of libraries a sane system should provide. Here are some of the classic ones with some of the defined functions listed.
- stdio.h
-
Includes the super important
printf
. It also includesstddef.h
. Alsofwrite
,fread
,fprintf
,fputc
,putc
,putchar
,ungetc
,fflush
,fopen
,freopen
,fclose
,remove
,rename
,rewind
,FILE
- math.h
-
Pretty much anything involving the eponymous topic of math. (Don’t forget -lm when you compile!). Here are some useful ones:
ceil
(nearest whole above),exp
,floor
(nearest whole below),round
(proper rounding at .5),pow
,sqrt
. And most of the others:acos
,asin
,atan
,atan2
,cos
,cosh
,fabs
,fmod
,frexp
,ldexp
,log
,log10
,modf
,sin
,sinh
,tan
,tanh
. - stddef.h
-
size_t
,offsetof
,NULL
- stdlib.h
-
exit
,abort
,assert
,perror
,atexit
,getenv
,system
,malloc
,calloc
,realloc
,free
,atoi
,atol
,atof
,strtod
,strtol
,strtoul
,rand
,srand
,qsort
,bsearch
Here’s a notable use:char *u; u= getenv("USER");
- ctype.h
-
isalnum
,isalpha
,isdigit
,isxdigit
,isgraph
(visible character),isprint
(printable character),isupper
(case),islower
,iscntrl
,ispunct
,isspace
,tolower
,toupper
- string.h
-
strlen
,strcpy
,strncpy
,memcopy
,memmove
,strcat
,strncat
,strcmp
,strncmp
,memcmp
,strchr
,strrchr
,memchr
,strcspn
,strpbrk
,strspn
,strstr
,strtok
,strerror
,memset
- unistd.h
-
Includes the
getopt
function. Alsoread
,pread
,pwrite
,chown
,chdir
,getwd
,exec
,nice
,exit
(andstdlib.h
!),getuid
,fork
,link
,symlink
,unlink
,rmdir
,getlogin
,gethostname
,chroot
,sync
,encrypt
. - locale.h
-
setlocale
,localeconv
- time.h
-
asctime
,ctime
,clock
,difftime
,gmtime
,localtime
,mktime
,time
,strftime
- signal.h
-
Defines functions and MACROS for handling signals.
signal
,raise
alsoSIGABRT
,SIGFPE
,SIGILL
,SIGINT
,SIGSEGV
,SIGTERM
Preprocessor Macros
All kinds of mischief can be had with preprocessor tricks. Generally it seems that this should only be used to manage the software development aspect of the program and not the program’s actual functionality. Here’s an example of a preprocessor macro in use:
#include <stdio.h> #define TYPE(T,V) T V;printf(#T"= %d\n", sizeof V); int main(void){ TYPE(char,a_char) TYPE(short int,a_short_int) TYPE(int,an_int) TYPE(unsigned int,an_unsigned_int) TYPE(long,a_long) TYPE(unsigned long,an_unsigned_long) TYPE(float,a_float) TYPE(double, a_double) TYPE(long double,a_long_double) return 0;}
To see what this preprocessor macro does, see below.
Other preprocessor tricks would be stuff like general constants that should be flexible depending on how someone might want to compile the program:
#define PRECISION .0001
Often there is a big maze of include files and it’s easy to have one place include a library and then another place try to do it too resulting in some kind of clash. The following checks to see if the special library has been loaded and if not, it loads it. Subsequent uses of this will be ignored.
#ifndef SPECIAL
#include "special.h"
#define SPECIAL
#endif
Another time this is useful is to disable chunks of code that might be present for testing.
#define FROM_PCAP_FILE
#ifdef FROM_PCAP_FILE
... Use the save file input ...
#else
... Use the device input ..
#endif
This particular example is shown more fully at my libpcap notes.
Debugging With Preprocessor Tricks
Preprocessor macros can be useful for debugging messages too.
#define VERBOSE 4
...
if (VERBOSE > 2) {printf("A level 2 message.");}
Just set the value to 0 to turn off verbose messages. This allows the programmer to set up a bunch of diagnostic print statements that can be turned on or off easily.
Similarly you can "comment out" large blocks of code by wrapping it in
a #ifdef
block.
#define PROBE2 #ifdef PROBE1 /* It can be annoying to comment out code that already has comments. */ void processPacket(u_char *userarg, const struct pcap_pkthdr* pkthdr, const u_char *p){ for (cc=52;cc<1500;cc+=536) { printf("Test: %d\n",p[cc+1]*256+p[cc]); } return; } #endif #ifdef PROBE2 void processPacket(u_char *userarg, const struct pcap_pkthdr* pkthdr, const u_char *p){ for (cc=0;cc<pkthdr->len;cc++) { printf("%d ",p[cc]); } return; } #endif
With this you can also #define
settings as a GCC option with
-DPROBE2
but that just acts as a define at the beginning of your
code so subsequent defines will override this.
While there is no shortage of tricksy ways to use the preprocessor for debugging, it seems to come down to sticking to a few guidelines.
-
Let the compiler see the debugging code so that any future warnings are caught so you’re not taken by surprise when you turn the debugging on 20 years in the future.
-
Keep it simple.
#define DEBUG #ifdef DEBUG #define D #else #define D for(;0;) #endif
Debugging Preprocessor Problems
Sometimes the problem is with the preprocessor swamp and you need to
get some idea about what’s going on. For example, maybe you’ve got an
#ifdef
but you don’t know if it is getting hit.
#if CV_VERSION_MAJOR >= 4 #warning "CV_VERSION_MAJOR >= 4 IS TRUE" #include "constants_c.h" #endif
This is a compile time message. Use #error "message"
if you want
it to actually halt compilation.
Template Functionality In C
Related to the hackish messes one can make with the preprocessor are C++ templates. How can template functionality be done in plain C? This excellent article explores several different methods, mostly using the preprocessor but also with fancier techniques. So if you love templates, that’s no reason you can’t use C!
Idiosyncratic C Operators
-
x++
-
Increments variable
x
by 1 after using it in this spot. -
++x
-
Increments variable
x
by 1 before using it in this spot. -
x--
-
Decrements variable
x
by 1 after using it in this spot. -
--x
-
Decrements variable
x
by 1 before using it in this spot. -
{test}?{true}:{false}
-
An "if" statement for saving punch card chad.
-
x,y
-
Evaluate and discard
x
, evaluate and retainy
. -
x=y=z=3
-
All of the variables x,y,z are 3. Assignment is an expression. This works left to right.
Bitwise
-
&
-
bitwise AND
-
|
-
bitwise inclusive OR
-
^
-
bitwise exclusive OR
-
<<
-
bit shift left
-
>>
-
bit shift right
-
~
-
bitwise NOT
Logical Operators
There are logical operators too.
-
||
Logical OR. -
&&
Logical AND.
Here’s an explicit run down of how they behave.
printf("%X\n", 0x00&0x00 ); /* 0 */ printf("%X\n", 0x01&0x00 ); /* 0 */ printf("%X\n", 0x01&0x01 ); /* 1 */ printf("%X\n", 0xFF&0x01 ); /* 1 */ printf("%X\n", 0xFF&0xFF ); /* FF */ printf("%X\n", 0x00|0x00 ); /* 0 */ printf("%X\n", 0x01|0x00 ); /* 1 */ printf("%X\n", 0x01|0x01 ); /* 1 */ printf("%X\n", 0xFF|0x01 ); /* FF */ printf("%X\n", 0xFF|0xFF ); /* FF */ printf("%X\n", 0x00&&0x00 ); /* 0 */ printf("%X\n", 0x01&&0x00 ); /* 0 */ printf("%X\n", 0x01&&0x01 ); /* 1 */ printf("%X\n", 0xFF&&0x01 ); /* 1 */ printf("%X\n", 0xFF&&0xFF ); /* 1 */ printf("%X\n", 0x00||0x00 ); /* 0 */ printf("%X\n", 0x01||0x00 ); /* 1 */ printf("%X\n", 0x01||0x01 ); /* 1 */ printf("%X\n", 0xFF||0x01 ); /* 1 */ printf("%X\n", 0xFF||0xFF ); /* 1 */
Main Structure Of A C Program
A C program is a collection of functions, routines that possibly take
some input and possibly return some output. All C programs that run
must have one and only one function called main
.
Here is a typical structure showing how the main function can be passed the command line arguments. This program is useful for diagnosing exactly what the C program is receiving from the executing shell. It also shows the polite return code (0 is usually success and 1 is usually failure while other numbers can signify fancy modes of failure or other things).
/* Comments look like this! */ #include <stdio.h> int main(int argc, char *argv[]) { int i= argc; for (i-1;i+1;i--) { printf("Argument #%d:%s\n",i,argv[i]); } return 0; }
Or here’s a less readable version:
#include <stdio.h> int main(int argc, char *argv[]){ while (argc--) printf("%s\n", *argv++); }
Note
|
To only show the arguments and not the program name (element 0),
just make both of the postfix modifiers (-- and ++ ) into prefix
modifiers. |
If you don’t care about the command line arguments use something like:
int main(void) { /*code goes here*/; return 0; }
There is also a third parameter you can assume a main function to have
and this is the envp[]
which contains the environment variables.
The Bash man page says, "When a program is invoked it is given an
array of strings called the environment. This is a list of name-value
pairs, of the form name=value." These are the ones the shell has,
generally, marked with the export
command (or by inheriting from
parent processes).
This code will show all the environment variables a program might know
about when run from a particular shell; the env
utility does this
too (when run without arguments).
#include <stdio.h> void main(int argc,char *argv[],char *envp[]) { for (int i=0;envp[i]!=NULL;i++) printf("\n%s", envp[i]); }
Note that these are strings in the format of USER=xed
where the
first =
is the one the separates the variable name from the value.
Obviously this means that having =
as part of your variable name is
so stupid, it is probably not allowed.
Functions
An interesting thing about C functions is that it is not a great idea to nest them. I did not know this until I tried it — to limit the function’s scope to the parent function — and got an error. Turns out that there is some GCC extension to make such things work, but if you have to go to such heroics, you know you’re not on the path of what could be called "normal C". It turns out that there is a lot more complexity hiding in this functionality than I (someone whose own programming language trivially handles nested functions until RAM runs out) initially imagined. This example by Knuth is an interesting demonstration.
Types
C is a "strongly typed" language meaning that it carves out memory for various purposes based on very explicit definitions of the resources which will be used. Important types:
- int
-
Integer, i.e. non fraction whole numbers.
- float
-
Numbers that can represent a continuous value (to the accuracy a binary representation ultimately provides).
- char
-
A character.
- enum
-
An enumeration. Used to create a type with a constrained set of possible values.
enum lightswitch {Off, On};
Herelightswitch
can be either "Off" or "On" which is the same as 0 and 1. If you wanted different values, use something likeenum lightswitch {Off=-1, On=1};
- union
-
Define with something like:
union Lights { int Switch; float Dimmer;}
This allows one thing (Lights
) to either have an int value if it’s just a switch and a float value if it’s a dimmer. It’d be good to keep a separate variable around to store which you’re using at any given time or confusion will result. - struct
-
A structure. Used to create custom types that hold collections of things.
struct point { int x; int y; int z};
Note the semi-colons in there — this is not like normal declarations but more likeclass
declarations in Cxx. To declare a variable of this type you need to dostruct point LastKnown;
There is also a special syntax to make very densely packed structs that may be able to share bits. This uses thetype name:W
syntax (for example,unsigned int mySwitchState:1
) where W is the width in bits needed to contain what you need. If you try to use more bits than you specified it may run but incorrectly. You can not initialize a struct where you define it because that does not allocate any memory really. But when you define a variable with a struct type, that can be given default values. That looks like this.struct P {int x; int y;}; struct P myPoint= {1,2};
. Sometimes you may find syntax likestruct point p = { .y = yvalue, .x = xvalue };
orstruct point p = { y: yvalue, x: xvalue };
This topic is called designated initializers and you can read all about their details here. It basically allows you to define the member elements in whatever order you want and even omit some of the definitions to be filled in with default behavior. - Custom
-
Sometimes you want to make some complex named type have a simple name. To do that use the
typedef
statement:typedef short int twobyter;
Now declaring something astwobyter X= 0
is the same as sayingshort int X=0;
This can be a handy trick when setting up arbitrary data structures that may find utility handling different payloads. Just typedef the data component of the complex structure and tailor that to your needs at the beginning of the program.
The various types require different amounts of storage in memory. It is best to choose the most economical type which satisfies requirements. This is a nice feature to be able to optimize in this way, however, since it is not optional it is also one of those pains that makes C programming a bit tedious. Here is the output of the preprocessor example above which shows the size in bytes of various storage types on my machine.
char= 1
short int= 2
int= 4
unsigned int= 4
long= 4
unsigned long= 4
float= 4
double= 8
long double= 12
const
Unlike the much worse situation with
C++ the situation with const
in C is only extremely
confusing and prone to error. Basically if you’re really, really
serious about something being really truly immutable, you should just
not create a "variable". Just literally say the thing every time. So
if you want to use the value for pi, just say "3.14159265358979323846"
everywhere. That is a lot to type and prone to error, so C
conceptually provides a way to do this that’s about as clever as using
sed — just use preprocessor macros.
#define PI 3.14159265358979323846
The const
specifier is for things where you don’t know what it will
be, but once it is initialized, it will never change. For example, it
can be applied to things like this.
int main(const int argc, char** argv){ ...
Here the number of arguments will be different depending on how the
program is executed, but once the value is established, argc
will be
locked for the duration of the execution. Without out const
you
could, say, decrement the argc
as you dealt with each argument; as
shown, you can not.
Things get icky when slapping a const
on pointers.
This means that the pointer is constant but the stuff that it points
to is not.
char * const a;
So here we’re saying that a
always points to some address &a
and
never anywhere else no matter what.
This order has a different meaning.
const char * b;
Here the pointer is not const and in theory could change to point to a new address location. However, in both of those locations, you’ll want to be finding a constant character. If you try to change that character, i.e. the contents, it will fail. So basically you can’t do anything like this.
*b= 'x'; /* Fails. */
To parse these tricky declaration puzzles, start at the variable and spiral outward starting to the right (e.g. "b is a pointer to a char which is constant" or "a is a constant pointer to a char"). Here is a very good web resource for untangling this mess called "C gibberish → English".
printf
Format Codes
The return value for printf
is the number of characters written
which can be handy.
The format specifier has this form:
% [flags] [field_width] [.precision] [length_modifier] conversion_character
Here are the modifier flags:
|
left justified |
|
always mark a |
|
use |
|
pad leading zeros to specified width |
|
Modifies style of each type. For example, for |
Field width is the minimum space that will be used if the output is shorter than the value provided.
Precision is the number of digits after the decimal point in numbers
with decimals and the number of total digits in others. In [gG]
it
is the number of significant digits.
The length modifier is h
(short or unsigned short), l
(long), or
L
(long double).
The conversion character is one of the following:
|
|
|
|
|
|
|
|
|
single |
|
|
|
|
|
either like |
|
the argument is a pointer to which the number of characters converted thus far is assigned. Nothing is output. |
|
output a string, i.e. a pointer to a |
|
output an implementation representation of a pointer - use for debugging? |
|
output an actual "%" |
Bit Masks
Basically you can store a set of many boolean variables in one C variable. Since you’re going to need a minimum of 8 bits to do about anything, the theory goes that if you’re just needing to store 1 bit, you might as well have that state variable serve multiple purposes. Despite sounding horrible, this actually produces code that is surprisingly readable.
The basic technique is to set the various flags with bit shifts to
store them in the right places. Then when you want to create a state
collection, just "or" them together with |
. When you want to check
to see if a flag is set in a collection, just use &
to get that back
out. Here’s an illustrative example:
/* An example of how to use bitmasks. */ #include <stdio.h> #define LIGHTS_ON ( 1 << 0 ) #define BRAKE_LIGHTS_ON ( 1 << 1 ) #define WIPERS_ON ( 1 << 2 ) #define HORN_ON ( 1 << 3 ) int main(int argc, char *argv[]) { unsigned int car_status; car_status= LIGHTS_ON | WIPERS_ON | BRAKE_LIGHTS_ON; if (car_status & LIGHTS_ON) { printf("Lights on.\n");} if (car_status & BRAKE_LIGHTS_ON) { printf("Brake lights on.\n");} if (car_status & WIPERS_ON) { printf("Wipers on.\n");} if (car_status & HORN_ON) { printf("Horn on.\n");} return 0; }
This program will produce this result:
Lights on.
Brake lights on.
Wipers on.
And to actually control the bits.
car_status |= WIPERS_ON; // Sets wipers bit on. car_status &= ~WIPERS_ON; // Sets wipers bit off.
Note that this technique is not especially type safe since a function
expecting a well crafted collection of bits can be sent any old value
that works and the compiler won’t notice. This is one reason that in
C++ bool
types are more robust. But bitmasks can be useful and they
definitely pop up a lot in various libraries; understanding how they
work is important.
Pointers
Objects in C can be handled by their names (which imply their contents), but a far more powerful and flexible technique is to work with them by only specifying the address where the data of interest is. The reason for this is that it’s computationally expensive to shuffle things around in memory if you don’t really need to. It’s better to leave the bulk of the thing alone and just refer to it where it is needed. It’s a bit like money. You could trade gold specie for the things you want, but for most transactions, it’s easier to leave the gold in a vault somewhere and just trade promissory notes referring to it. (Assuming a gold standard) writing a check is like referring to a reference (bank notes) to actual money (the gold). This is like a C pointer’s ability to point to a pointer.
So if you have a variable called big_thing
with a lot of data in it,
you can do things with that variable by name, but sometimes it is more
effective to just refer to the location where that thing lives. It’s
quite like addresses in real life: you don’t have to specify the exact
nature of a house at a particular address or if it’s a strip mall or
whatever, just the address is sufficient to deal with it for many
purposes.
Important ideas with pointers:
-
Pointers are a data type that holds exactly one memory address. What that address actually is should seldom ever be of concern.
-
Pointers can point to other pointers.
-
int x;
-
Defines an integer type called x.
-
int *ptr2x;
-
Reads "Define the thing ptr2x points to as an integer." This (
*
) is technically called the "indirection operator". -
ptr2x= &x;
-
Reads "Set ptr2x to the address of the object defined by x."
-
p->n= 0;
-
Sets to zero the subcomponent
n
in the structure that pointerp
points to. This is technically called the "indirect member access operator".
-
Void Pointers
Untyped pointers can be created with the void
type.
void *anyptr;
To dereference such a pointer, it must be type cast with something. In this example the contents of two different kinds of variables, ib and fb, are set from the dereferencing of the same pointer.
#include <stdio.h> void main() { int ib,ia= 666; float fb,fa= 3.14; void *anyptr; anyptr= &ia; ib= *((int*)anyptr); printf("ib now is: %d\n",ib); anyptr= &fa; fb= *((float*)anyptr); printf("fb now is: %.2f\n",fb); }
Arrays
A[i]
is the same as (*((A)+(i)))
So these are equivalent.
A[4]= 'x';
*(A+4)= 'x';
You can load the array at definition.
float origin[3]= {0,0,0};
char mystring[]= {'x','e','d','\0'}; /* The '\0' makes it a "string". */
char mystring[]= {"xed"}; /* Equivalent. */
I’m pretty sure you can’t define the array and then later set it with
{0,0,0}
or something like that. Just keep in mind why strcpy
and
memcpy
exist. Here’s a way to use memcpy
to initialize an array.
int colpos[MAXNUMFIELDS];
memset(colpos,0,sizeof(colpos));
Here are ways to initialize arrays that are possibly compiler specific.
int colpos[MAXNUMFIELDS]={[0 ... MAXNUMFIELDS-1]=0}; // Works on gcc!
int colpos[MAXNUMFIELDS]={0}; // Works on gcc!
Brackets are actually a postfix operator for manipulating the array specified by the operator. (Confusing? Yes.)
Elements of arrays are stored in successive pointer address locations.
&origin[1]-&origin[0] == 1
Find the length of an array:
int length= sizeof origin / sizeof origin[0]
Note
|
In case of confusion, note that *argv[] is the same as **argv . |
Multi-Dimensional Arrays
Both of these syntax styles work.
int a[3][4] = { {0, 1, 2, 3} , /* a[0][0],a[0][1],a[0][2],a[0][3] */ {4, 5, 6, 7} , /* a[1][0],a[1][1],a[1][2],a[1][3] */ {8, 9, 10, 11} /* a[2][0],a[2][1],a[2][2],a[2][3] */ }; int b[3][4] = {0,1,2,3,4,5,6,7,8,9,10,11}; /* Same as a. */
It looks like the inner braces are for decoration only.
Variable Length Arrays
Hard to know whether to include this one here or at dynamic memory but it seems that the old limitation of needing to preallocate array memory has been relaxed a bit with C99. Official GCC documentation about arrays whose size can be specified at runtime.
Here’s a sample that demonstrates this crazy thing actually works.
#include <stdio.h> #include <stdlib.h> int use_a_variable_length_array(int n) { float vals[n]; /* No idea how big it might be! */ for (int i=0;i<n;i++) { vals[i]= 0.0; } /* Yes, it can be used! */ return n; } int main(int argc, char** argv) { /* Use input to set array size. */ printf("%d\n",use_a_variable_length_array(atoi(argv[1]))); return 0; }
Running the command with an argument of 1000000 does work fine. However, running the command with an argument of 10000000 produces an unhelpful "Command terminated" message.
If you use the GCC option -Wvla
then this code will produce a
warning (but otherwise compile fine): "warning: ISO C90 forbids
variable length array ‘vals’ [-Wvla]"
Some people believe that VLAs are a performance drag.
Some security people believe that it allows you to exhaust the stack and then write naughty security problems into memory and then jump to them and take over the operations. Apparently they have been aggressively weeded out of the Linux kernel.
Linus himself says: "…using VLA’s is actively bad not just for security worries, but simply because VLA’s are a really horribly bad idea in general in the kernel."
There’s even a GCC thing where you can put variable length arrays in structs. Here is what Linus has to say about that. "The feature is an abomination. I thought gcc only allowed them at the end of structs, in the middle of a struct it’s just f*cking insane beyond belief."
Use sparingly if you must.
Chars and Strings
An array of objects of the char
type has some special syntactical
properties in C. This is to facilitate the handling of "strings".
The following are all valid syntax but they may have subtle
differences.
char alphabet[26]; char theFword[4] = {'f', 'u', 'n', '\0'}; char string[6] = "twine"; char gray[] = {'g', 'r', 'a', 'y', '\0'}; char salmon[] = "salmon";
The first thing to note is that this isn’t Python — single and double quotes are not the same! Use single quotes around single characters and double around strings.
The next important confusion to clear up is that there are two kinds of strings. Here’s a look at the two types of string-like memory structures.
If you have an ASCII value and want to output what character string that is, don’t forget that you can cast that from an int to a char for output purposes.
Character Arrays
A character array is just what it sounds like with no serious mystery.
char myarray[50];
This produces a memory reservation, on the stack, for 50 char
type
objects addressable by index just as an array should be.
myarray[5]= 'c'; /* No problem with that. */
You can even do something like this to initialize it in the declaration.
char myarray[50]= {'x','e','d','.','c','h'};
That fills the first 6 with characters but leaves the last 44
undefined (use memset
if that’s a problem). This does the exact same
thing.
char myarray[50]= "xed.ch";
There a string literal is used, temporarily, to provide the character constants to define the array.
The confusing part about this is that you can not redefine the string in an obvious way that looks very similar but is not.
char myarray[50]= "xed.ch";
myarray= "This will be an error!";
You can pick at each member of the array with myarray[0]= 'T'
and so
on.
If you omit the size to initialize to, it is computed from the length of your string literal initializer.
char myarray[]= "xed.ch";
printf("%d\n",sizeof(myarray)); /* Will return 7, not 6! */
Note that it adds one more to store the NUL termination string it
thinks you will appreciate. If you provide a length (like [50]
shown above) then it will assume you’ll fit your full string and the
nul terminator in the 50, and sizeof
will return 50. So if you need
the alphabet to be used in string functions, you need this.
char abc[27]= "abcdefghijklmnopqrstuvwxyz\0";
Or simply do this which is the same thing.
char abc[]= "abcdefghijklmnopqrstuvwxyz";
Like arrays in general myarray
is the same as &myarray
. In other
words, it is just an address where some same-sized chunks of stuff are
allocated.
Compilers are weird but in theory, this kind of character is stored on the stack where it is accessible to changes.
String Copying (strcpy)
Other programming languages do a really good job of hiding the gory
details of using strings. C does a really good job of not hiding such
details. This can be a challenge until you get the hang of it. A
common scenario is you want to preserve a copy of a string in case the
primary copy gets mutated. You can’t just say something like char *stmp=s;
and think you’re done! The new string must be declared and its size
must be explicitly provided so that memory can be carved out to store
it. Here is a way that has worked for me.
char *stmp= malloc(strlen(s) + 1); strcpy(stmp,s);
This copies the contents of s
into the stmp
variable which has
just been created and explicitly set up with the same amount of memory
as s
.
String Literal
This syntax defines a pointer to a string literal.
char *myptr= "xed.ch";
An ordinary string literal has a type "array of n const char". Note
the const
; this means that this kind of string is stored in a way
that modification is not correct. It might compile in weird cases, but
don’t do it!
The underlying storage, unlike a character array, is in some kind of
read only memory. All you can do is change what the pointer points to.
This is ok for many applications because you often want to replace a
string with another. Though this smells like the string is mutable,
you are really changing the variable pointer to point to a different
constant array. Using the myptr
definition above this now works.
myptr= "Some new text is ok now!";
This implies that myptr
and &myptr
are not the same the first is a
pointer to the latter’s array address (where the read only array
really lives). You can’t reach into the array with the pointer index
because it’s the wrong pointer. So this does not work.
myptr[23]= '.';
Misc String Functionality
If you want fancy string capabilities, you might need a custom library to do what you want. Here is an interesting one.
Need to find out how long a string is or if it is not empty, use
strlen()
which is in string.h
.
if (strlen(err)){printf("%s\n",err);exit(1);}
Branching
Basically computers compute by making logical decisions. In C, the main decision making feature is the if statement:
if (test_expression) {statement_block} else {statement_block}
The else if
construction allows a single choice to be made from a
series of possible conditions.
if (te1) {sb1} else if (te2) {sb2} else if (te3) {sb3} ... else {sb}
For if
statements, the test expression can be anything that reduces
to an integer which equals 0
(which is false) or something else
(which is true).
A fancier form of branching can be done with the switch
and case
statements. Here’s how it works:
#include <stdio.h> #include <unistd.h> int main (int argc, char **argv ){ static char optstring[]="a:b:c"; int o; while ( (o = getopt(argc, argv, optstring)) != -1) switch(o) { case 'a': { printf("Option argument for `a` is: %s\n",optarg); break; } case 'b': { printf("Option argument for `b` is: %s\n",optarg); break; } case 'c': { printf("Option `c` has no argument.\n"); break; } default: { printf("Option `%c` is unknown.\n", o); } } return 0;}
Note
|
For a more comprehensive example of option parsing, see the Option Parsing section. |
switch (offset) { case 0: {f|= 0x1; break;} case 185: {f|= 0x2; break;} case 370: {f|= 0x4; break;} case 555: {f|= 0x8; break;} default: { printf("Packets with wrong offsets (%d) are being captured!!",offset);} }
Here I’m trying to set 4 flags as some fragmented packets come in. These packets should have well defined offset values (0,185,370,555) and when each shows up, the flag is set. If some other offset shows up, that’s weird and the default condition handles it. If the value of f is 0b1111/0xF/15d when I’m done, then I know all the pieces have arrived.
Another trick to keep in mind with switch
and case
is that if you
don’t include a break
all the subsequent commands will be executed
until one is found or the switch
block is ended. This leads to
frustrating errors if you simply forget, but intentionally, this can
be used to get two options to do the same thing.
switch (animal) { case snake: {snake_stuff; break;} case tiger: { /* Same as next one. */ } case lion: {cat_stuff(); break;} }
I’m not sure if the braces on tiger are even required. Maybe it needs a semi-colon. But this works.
Looping
Interesting software is a result of many logical decisions being
repeatedly performed in interesting ways. The main way to achieve
multiple iterations of an action in the C idiom is with the for
loop:
for ({initial};{test_before_each_iteration};{eval_after_each}){thing_to_do}
Here is a very common idiomatic usage.
for (int i=0;i<SIZE;i++) { printf("%d\n",a[i]); }
Here’s a more interesting example:
for (hi=100,lo=60;hi>=lo;hi--,lo++){converge(hi,lo);}
Note
|
You need to define the variables that appear in the for
statement prior to using it. If that really bugs you, you can try
compiling with -std=c99 but that seems kind of non standard to me in
some slight way. The less compiler magic, the better IMO. |
The other two important loop structures are similar with a subtle
difference. These are the while
loops. The most basic works like so:
while ({test_expression}) {do_this_stuff}
If before any attempt to execute the body of the loop the test
expression is 0
or NULL
then the loop is skipped and control is
passed on.
Note that this can also be expressed with for
.
for (;{test_expression};) {do_this_stuff}
If you want the test evaluated after the loop body code is run (which implies the loop body will always run at least once) use this form:
do {do_this_stuff} while ({test_expression});
Exiting loops
- continue
-
This statement jumps control to the end of the current loop body statement as if it had completed an iteration and was now ready for more. It allows for short circuiting some code that might otherwise be performed on every iteration.
- break
-
This statement jumps control just past the end of the current loop body statement as if the last iteration had just occurred and finished. This statement basically says that this looping structure is completely finished, not just this iteration.
- return [expression]
-
This is the way to break out of a function. The optional expression is passed back to the calling function by value (so use pointers where that’d be unpleasant). If the function was defined as
void
then don’t include an expression. A function can have severalreturn
points depending on the situation. If you havestdlib
included, you can returnEXIT_SUCCESS
orEXIT_FAILURE
.
Dynamic Memory
First thing to note because I find it weirdly hard to memorize:
- stack
-
Typed variables allocated at compile time and fixed size. Mnemonic is that these regions of memory are well organized because they can be given premeditated attention.
- heap
-
Slower fragmented memory allocated by dynamic memory management, e.g.
malloc()
. Mnemonic is that these regions of memory can be quite chaotic because the requirements are ad hoc at runtime.
Anytime you are working with an amount of data that you can not
explicitly define an upper bound on ahead of time, you probably need
to use dynamic memory. The main mechanism of dynamic memory is the
malloc()
function (include stdlib.h
) which runs around looking for
enough contiguous memory to reserve for some run time defined purpose.
Once malloc()
finds the memory you’ve requested, it returns a
pointer to that location so you can start doing stuff with it. The
format for using malloc()
is a bit fussy:
p= (struct Thing *) malloc (sizeof (struct Thing))
Here the sizeof()
function returns exactly the value (in bytes) for
just how much memory an instance of struct Thing
would need. That
memory is is reserved and the pointer that is returned is cast
(forced) by the first parentheses to point to memory that is
configured as a struct Thing
.
When your program is finished with some memory that has been
allocated, it’s polite (or maybe even critical) that it be returned to
the system for use. The way to do that is with the free()
function
which takes a pointer to the memory you want recycled.
Note
|
If you’re really interested in C’s low level memory management, here is an interesting guide to writing your own malloc and friends. |
Simple Stack Implementation
Before C can be made into anything useful, you really need to create some tools to make certain tasks easier to implement. One theme that comes up over and over in more substantial programming tasks is the need to hold an arbitrary bunch of data somewhere. Since C requires very explicit declarations of all memory used, this can be challenging to always attend to it. It is therefore useful to create some templates that can get you into more interesting parts of the problem quickly.
Here is an implementation of a simple stack system. The stack is fed
data with a Push()
command, that is data is appended to the end of
the stack (a FILO queue). Data is retrieved (and removed) from the
stack with a Pop()
function. Note the type definition
cargo_type
allows the stack to carry whatever kinds of data types
you want simply by redefining this.
#include <stdlib.h> #include <stdio.h> #include <time.h> /* Custom type definitions. */ typedef int cargo_type; struct linkbox { cargo_type cargo; struct linkbox* next;}; typedef struct linkbox lbox; /* Function prototypes show inputs and outputs so subsequent */ /* mentions of them aren't confusing (seemingly undefined) */ /* to the compiler. */ void Push(cargo_type v, lbox** p2mylist); cargo_type Pop(lbox** p2mylist); cargo_type Iter(lbox** current); int dice(int sides); int main(void){ int i,m; srand(time(NULL)); m= dice(20); lbox *mylist=NULL; for (i=0;i<m;i++){ Push( dice(6), &mylist); } lbox *index= mylist; int sum=0, n=0; while (index) { sum += Iter(&index); n++; /*printf("Iter:%d\n",Iter(&index));*/ } printf("Average:%f\n",(float)sum/n); while (mylist) { printf("Popping:%d\n",Pop(&mylist)); } return 0;} int dice(int sides){ return rand() % sides + 1;} cargo_type Iter(lbox** c){ cargo_type t= (*c)->cargo; *c= (*c)->next; return t;} void Push(cargo_type v, lbox** p2mylist){ lbox* latestbox; latestbox= (lbox *) malloc(sizeof(lbox)); latestbox->cargo= v; printf("Pushing:%d\n",v); latestbox->next= *p2mylist; *p2mylist= latestbox; return;} cargo_type Pop(lbox** p2mylist){ cargo_type t= (*p2mylist)->cargo; lbox *dead= *p2mylist; *p2mylist= (*p2mylist)->next; free(dead); return t;}
This program is also an example of passing function arguments by reference. It needs a pointer, so the pointer is pointed to by another pointer which gets sent to the function. When the transporter pointer is dereferenced, the original pointer that was supposed to show up at the function is ready to go. The reason this is necessary is that C function arguments are copied over and if you copy a pointer, it’s a different pointer (even if it points to the same place). If you inserted a new node between a function copy of the pointer to the list and the list, then you’d lose track of the (complete) list when the function variable’s memory was freed on function exit.
File Operations
-
The best general description I’ve found of what is going on with C’s I/O library and functions is here.
-
An excellent reference for file handling functions. Similar to the man pages, but better organized.
After being able to allocate memory you need you often need to use the
file system to read actual data to fill that memory. Using files is a
fundamental operation that has its quirks in C. The following example
reads, character by character, a file called ./fileio.c
and prints
it to the output, and writes it to a file called /tmp/fileio-copy.c
.
The hard to memorize bits are including stdio.h
and creating a
FILE
pointer. Also opening and closing the file require fopen
and
fclose
.
#include <stdio.h> int main(int argc, int *argv) { FILE *fpi,*fpo; fpi= fopen("./fileio.c","r"); fpo= fopen("/tmp/fileio-copy.c","w"); char curchar; curchar= fgetc(fpi); while (curchar != EOF) { printf("%c",curchar); fputc(curchar,fpo); curchar= fgetc(fpi); } fclose(fpi); fclose(fpo); return 0; }
While I’ve shown fgetc
and fputc
, other possibilities include
fprintf
and fscanf
. Also fread
and fwrite
(for binary).
It’s also worth pointing out that a more C styled way to do the main read loop would probably be something like:
while ( (curchar= fgetc(fpi)) != EOF ) {...}
Here is a very simple test of file writing. This can help you determine if file writing is slow. I just writes a different number to a certain file a million times. It renames so that the operation is atomic - anything reading the files should get he entire number without worrying it will get cut off.
#include <stdio.h> int main(int argc, int *argv) { FILE *fp; unsigned int c= 0; /* Counter. */ while (c++<1e6) { fp= fopen("/dev/shm/iotesting.tmp","w"); fprintf(fp,"%d\n",c); fclose(fp); rename("/dev/shm/iotesting.tmp","/dev/shm/iotesting"); } return 0; }
Running this on a decent desktop, I got 83k atomic write operations per second. On my 2009 laptop, I got 31k. On a Raspberry Pi 4 I got 17k. Of course (on the desktop) I got write 7.5 million (non-atomic) write operations per second when I kept the file open the whole time.
I think that one of the best ways to read in data is fgets
. Here’s a
pretty solid way to do that using dynamic buffers that grow if needed
using realloc
.
/* Here's an example of using realloc to grow the buffer to as much as * needed when bringing in data. This particular example will take the * specified file, or standard input, and render it backwards. Imagine * rev and tac combined. Running this twice should cancel. * $ md5sum revtac.c <(./revtac ./revtac.c | ./revtac) */ #include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char *argv[]) { FILE *fp; if (argc-1) fp= fopen(argv[1],"r"); else fp= stdin; if (!fp) {perror("Could not open file."); exit(EXIT_FAILURE);} char *str= malloc(4096), *s= str; int len= 0; while (fgets(s,4096,fp)) { len += strlen(s); str= realloc(str, len+4096); s= str+len; } fclose(fp); int n; for (n=len++;n;n--){ printf("%c",str[n-1]); } return(EXIT_SUCCESS); }
The previous example had two limitations. First, because it needed to
know the end of the input before beginning its output, it loaded the
entire contents of the input into memory. This is not ideal for very
big jobs where sequential processing can be applied.
Second, it only handled one file. Proper Unix utilities should be able
to accept data on standard input and/or as one or more files to open.
The quintessential utility that reliably does this is cat
. To show
how to create a program which can use an arbitrary number of input
sources like cat and address each line as they come, I have rewritten
cat
from scratch. Note that I am not Richard Stallman and I’m not
claiming this is the most robust cat implementation ever, but if you
need a program that does about the same thing as cat
but with a bit
of C code thrown in, this can be a better place to start than the
source code for the real cat
(which is also reasonable).
#include <stdio.h> #include <stdlib.h> #include <string.h> #define MAXLINELEN 666 void process_line(char *line) { // PUT THE ESSENTIAL LOGIC FOR THIS PROGRAM HERE!! printf("%d %s",(int)strlen(line),line); // For example: line length and line. } void process_file(FILE *fp){ char *str=malloc(MAXLINELEN), *rbuf=str; int len=0, bl=0; if (str == NULL) {perror("Out of Memory!\n");exit(1);} while (fgets(rbuf,MAXLINELEN,fp)) { bl=strlen(rbuf); // Read buffer length. if (rbuf[bl-1] == '\n') { // End of buffer really is the EOL. process_line(str); free(str); // Clear and... str=malloc(MAXLINELEN); // ...reset this buffer. rbuf=str; // Reset read buffer to beginning. len=0; } // End if EOL found. // Read buffer filled before line was completely input. // Allocate more memory for this line. else { // Add more mem and read some more of this line. len+=bl; str=realloc(str, len+MAXLINELEN); // Tack on some more memory. if (str == NULL) {perror("Out of Memory!\n");exit(1);} rbuf=str+len; // Slide the read buffer down to append position. } // End else add mem to this line. } // End while still bytes to be read from the file. fclose(fp); free(str); } // End function process_file int main(const int argc, char *argv[]) { FILE *fp; int optind=0; if (argc == 1) { // Use standard input if not files are specified. fp=stdin; process_file(fp); } else { while (optind<argc-1) { // Go through each file specified as an argument . optind++; if (*argv[optind] == '-') fp=stdin; // Dash as filename means use stdin here. else fp=fopen(argv[optind],"r"); if (fp) process_file(fp); // File pointer, fp, now correctly ascertained. else fprintf(stderr,"Could not open file:%s\n",argv[optind]); } } return(EXIT_SUCCESS); }
Change the process_line
function to do whatever it is you need to do
to the data.
Option Parsing
When running programs from the command line, the main
function can
be supplied with a list of optional parameters passed from the calling
program or shell. To properly parse this in a sensible way, C has some
nice functions that help keep things consistent and error free. Here
is an example of a complete option parsing routine which handles long
options. Long options are like --help
, --verbose
etc., and tend to
be popular with GNU utilities.
#include <stdio.h> #include <getopt.h> #include <stdlib.h> int main (const int argc, char **argv) { int help= 0; int i=0; int j=10; float k= 0; int o; while (1) { static struct option long_options[] = { {"help" , no_argument, NULL, 'h'}, /* Bools work well in C++. */ {"ivalue", required_argument, NULL, 'i'}, /* Integer arg required. */ {"jvalue", optional_argument, NULL, 'j'}, /* Integer arg optional. */ {"kvalue", required_argument, NULL, 'k'}, /* Float arg required. */ {0, 0, 0, 0} /* must be filled with zeros */ }; /* getopt_long stores the option index here. */ o = getopt_long(argc, argv, "hi:j::k:", long_options, NULL); if (o == -1) break; /* Detect the end of the options. */ switch (o) { case 'h': help= 1; printf("Help=%d\n",help); break; case 'i': i= atoi(optarg); break; case 'j': if (optarg){ j= atoi(optarg); } else { j=99; } break; case 'k': k= atof(optarg); break; default: printf("Unknown Option.\n"); return 0; } /* End switch construct */ } /* End while loop */ /* State of variables initialized by options. */ printf("i=%d, j=%d, k=%f, help=%d\n",i,j,k,help); printf("Option Index: %d\n", optind); /* optind is defined by getopt.h */ /* Print any remaining command line arguments (not options). */ if (optind) { printf ("Non-option ARGV-elements: \n"); } while (optind < argc) { printf("%s \n", argv[optind++]); } return 0; } /* End main */
Note
|
In the example program above the option -j (aka --jvalue ) is
defined as having an optional argument. Optional arguments
(identified with :: ) cause some ambiguity and to use them, you must
run your program specifying these arguments like: -j99 or
--jvalue=99 . If you try -j 99 or --jvalue 99 then the 99 will
be considered unattached to the
option. |
User Input
The gold standard for user input is the readline library. Originally a part of Bash, it is now a separate library found in all kinds of software.
To use the readline/readline.h
library you need to install packages
libncurses-dev
and libreadline-dev
.
#include <stdio.h> #include <stdlib.h> #include <readline/readline.h> int main(int argc, char** argv){ char* line= readline("cxesh> "); printf("Confirmation: \"%s\"\n",line); free(line); return 0; }
Compile with the -lreadline
flag.
gcc -o readlinetest readlinetest.c -lreadline
Time Functions
If you need to just throw a simple delay into a C program, including
the unistd.h
library will supply you with the sleep()
function
which takes an argument in seconds. Works pretty much like
/usr/bin/sleep
. From the same source comes usleep()
which is the
same thing with the argument divided by 1000, so 80 is not 80 seconds
but 80 milliseconds.
Including time.h
will get you a special type clock_t
which can
be filled by the clock()
function. Need the date or a full human
readable timestamp? This is basically how /usr/bin/date
works.
#include <time.h> ... time_t now; char buffer[80]; strftime(buffer,80,"%Y-%m-%d %H:%M:%S", localtime(&now)); printf("ISO8601: %s\n", buffer );
Here is a decent place to start for timing functions.
Signals
Sometimes you need to hear from some other process. There are tons of ways to do this (pipes, semaphores, sockets, etc) but signals are very ancient and direct. They are often used in userspace to convey information like, "Stop now because you’re causing problems!" but they can be used for many other purposes.
Here’s a nice GNU reference of various signals. It starts with this nice list of why you’d want to use signals in the first place.
-
Program Error Signals - report serious program errors.
-
Termination Signals - interrupt and/or terminate the program.
-
Alarm Signals - indicate expiration of timers.
-
Asynchronous I/O Signals - indicate input is available.
-
Job Control Signals - support job control.
-
Operation Error Signals - report operational system errors.
-
Miscellaneous Signals - includes user signals for whatever you need.
Note that there are fine points to this topic and the signal(2)
man
page explicitly says "Avoid its use: use sigaction(2) instead."
More nuanced thinking would suggest that for maximum ISO C portability
signal
is ok, while for best performance/safety in POSIX
environments, use sigaction
.
Useful Tricks
Associative Arrays
Also known as hashes, dictionaries (dict), Maps, etc. Although C does not have a first class data type for named arrays, the reason seems pretty clear to me — the creators of C just couldn’t imagine anyone dumb enough to need that. Of course they understood how terrifically useful such a feature is, but they also understood how trivial it would be to implement it from scratch using essential language features.
Why do I believe this? Go have a look at page 134 of the first edition of "The C Programming Language" by Kernighan and Ritchie, often simply called "The K&R Book" in C lore. There you will find section 6.6 called Table Lookup. In the two pages that follow, the motivation, and rationale for implementing associative arrays in true idiomatic C are provided. These two pages also include code for a complete implementation.
If that doesn’t humble you into focusing more on your deficiencies than the ones you perceive in C, I don’t know what will.
Here is an implentation and discussion, including alternatives. The specific hash function K&R use is really just for illustration and some people object to it; other reasonable looking hash functions can be found at the bottom of this page.
Random Numbers
To get a random number between 1 and 100 do something like this:
#include <stdlib.h> #include <stdio.h> #include <time.h> int main (void) { srand(time(NULL)); int mystery= rand() % 100 + 1; printf ("Random number from 1 to 100: %d\n", mystery); return (0); }
You need the srand()
to seed the random number generator. The rand()
function returns random numbers between 0 and RAND_MAX. If you need a
random number between 0 and 1, another way to do that would be to do
rand()/(RAND_MAX+1)
.
Warning
|
The method of seeding srand() with a time(NULL) function
is ok in many situations, but remember that this can be reversed
engineered. This means you don’t want to write a real-money gambling
game that is randomized in this way. Also if you run the program
quickly the time may be the same to within a second and this will
cause the "random" output to possibly repeat itself. |
If you are using a proper operating system (like Linux or a
fruit-based computer) there is a managed resource that collects
entropy for use by various processes in establishing randomness. This
source of randomness is presented as a file by the kernel and
automagically filled with pretty high quality random numbers (see man
random
for gory details). Here is a way to get random numbers using
a seed pulled from this source:
#include <math.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> int main (int argc, char *argv[]) { FILE *urandom; unsigned int seed; urandom = fopen ("/dev/urandom", "r"); if (urandom == NULL) { fprintf (stderr, "Cannot open /dev/urandom!\n"); exit (EXIT_FAILURE); } fread (&seed, sizeof (seed), 1, urandom); srand (seed); printf ("Random number from 1 to 100: %d\n", (int) floor(rand() * 100.0 / ((double) RAND_MAX + 1) )+ 1); exit (EXIT_SUCCESS); }
A good illustration of the difference can be seen by running these numerous times very quickly. If run 10,000 times, a random number between (and including) 1 and 100 should pop up roughly 100 times. You can see that producing random numbers from the OS’s seed does roughly that. The time based one, however, does a terrible job. Most of the time it will produce zero results with a particular preselected number ("88" in the following example).
$ for x in `seq 10000`;do ./rand_from_os | grep ' 88$' ; done | wc -l
97
$ for x in `seq 10000`;do ./rand_from_os | grep ' 88$' ; done | wc -l
94
$ for x in `seq 10000`;do ./rand_from_time | grep ' 88$' ; done | wc -l
0
$ for x in `seq 10000`;do ./rand_from_time | grep ' 88$' ; done | wc -l
512
This is because over the course of a few seconds to run, the time only changes a few times and most of the values will be from only a handful of seeds. Ironically, this problem is worse on higher performance machines.
Permissions
Sometimes you’re doing something like writing images with libpng
and
it wants to create them with absurd restrictive permissions. To change
this behavior, you need to add these lines.
#include <sys/stat.h> /* umask */ umask(022); /* Set file creation permissions to RW for me and R for all. */
Then make sure you delete any files you may have previously written because this will not affect existing files, just creation.
Note that the open()
function takes a umask argument which might
simplify things in easy cases where you’re doing the writing
explicitly.
Static Compile
Normally the compiler doesn’t reinvent the wheel for every tiny detail your program could possibly need. If there are normal things used by pretty much all normal programs, the compiler just dynamically links your code to a standard library object file that shares duty with other programs. This means that you’ll need to have those files available to run your executable. Normally this isn’t a problem because they’re always there or your system probably wouldn’t work.
But sometimes you need to send an executable into a situation where the libraries you use every day may not be available. Maybe it’s an old system. Maybe it’s a weird distribution. Maybe just the version numbers are messing with you. Or even just the paths are different.
To have the compiler create an executable with all the CPU op codes
necessary for your software to work in isolation, you’ll need the
-static
option. Don’t confuse this with the -s
option, which I
believe is for stripping out strings and symbols. The -s
option
makes your code smaller and slightly harder to debug; the -static
option makes your code much bigger and you should probably not be
doing active development with it.
Debugging
Print Error Messages
Something like this:
fprintf(stderr,"Prints to standard error.\n");
Also with #include <stdio.h>
assumed, you can also use this.
perror("File not found.\n");
Core Dump Analysis
What if you get the dreaded Segmentation fault
? This means something
bad happened at run time. Most errors are caught at compile time but
sometimes your program looks fine to the compiler and does a silly
thing once you actually fire it up. Besides mystical intuition the
best methodical way to analyze the problem is to have the system
create a memory dump at the time of the error and then use a special
tool to look through this memory file to figure out what went wrong.
To get a misbehaving program to create a core dump file compile like
this:
gcc -g -o sketchy sketchy.c
Or if you’re definitely going to use gdb
:
gcc -ggdb -o sketchy sketchy.c
If it still has a seg fault and you’re not getting a (core dumped)
message appended to it, try changing your environment with:
ulimit -c unlimited
This removes any restriction on the size of core files allowed by the shell.
Note
|
When you’re done playing with core files, you might want to do
ulimit -c 0 so that segmentation faults don’t generally produce core
files. Normally, it’s a pain to have these files mysteriously lying
around every time something crashes. |
gdb
Assuming you have a core dump called core
, run gdb like this:
gdb sketchy core
The core should load and allow you to investigate it. It might just tell you about the error and where it occurred.
Or if you don’t need a core dump, you can just run gdb sketchy
and
type run
to run the program and see if your error happens in a more
interesting and verbose way. Here are some of the important commands
to be aware of when using gdb.
<enter> |
previous command |
help |
very sensible help |
run |
continuous run - can be followed by args (see |
start |
start execution but in single step (stop at main), args ok |
step |
proceed execution to next source code line |
next |
like step but consider all subroutine lines as one |
finish |
execute until stack frame returns (stop at end of current function generally) |
print <var> |
prints the current value of the specified variable |
set args <arg1..argN> |
what is passed to programs started with run command |
show args |
query what arg list was set |
bt |
backtrace (or a nested list of function calls) good for finding where your program seg faulted |
break <n> |
set break point a source code line number n |
cont |
(also c) continue from stop at break point |
shell |
run a shell sub process using |
layout next |
set TUI split screen display to track through registers, assembly, or source |
refresh |
Refresh TUI screen (think Ctl-L) |
skip function <name> |
Skip named function in stepping, current if none given. |
until <line> |
Run until specified line number. |
quit |
leave gdb |
Here’s a nice guide to practical gdb use. More on TUI mode.
Valgrind
According to the man page, Valgrind "is a flexible program for debugging and profiling Linux executables. It consists of a core, which provides a synthetic CPU in software, and a series of debugging and profiling tools." Practically, it can be very handy in troubleshooting insidious memory errors.
On Debian you can simply apt install valgrind
. To use it, just run
your executable through it. So my code and argument ./cnow cnow.cno
becomes this.
valgrind ./cnow cnow.cno
This should show a bunch of helpful hints prefixed with the PID.
==3732== Memcheck, a memory error detector
==3732== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3732== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==3732== Command: ./cnow classic_c_cnow/cnow.cno
==3732==
...
==3732==
==3732== HEAP SUMMARY:
==3732== in use at exit: 0 bytes in 0 blocks
==3732== total heap usage: 124 allocs, 124 frees, 87,616 bytes allocated
==3732==
==3732== All heap blocks were freed -- no leaks are possible
==3732==
==3732== For counts of detected and suppressed errors, rerun with: -v
==3732== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
As you can see here, this code is not doing daft things with memory. Yay! Compare with this code.
==3903== HEAP SUMMARY:
==3903== in use at exit: 6,091 bytes in 1 blocks
==3903== total heap usage: 83 allocs, 82 frees, 325,847 bytes allocated
==3903==
==3903== LEAK SUMMARY:
==3903== definitely lost: 6,091 bytes in 1 blocks
==3903== indirectly lost: 0 bytes in 0 blocks
==3903== possibly lost: 0 bytes in 0 blocks
==3903== still reachable: 0 bytes in 0 blocks
==3903== suppressed: 0 bytes in 0 blocks
==3903== Rerun with --leak-check=full to see details of leaked memory
==3903==
==3903== For counts of detected and suppressed errors, rerun with: -v
==3903== ERROR SUMMARY: 51 errors from 5 contexts (suppressed: 0 from 0)
It is good form to make sure all your code runs without Valgrind leak errors.
If problems are detected, try it with valgrind -v
or valgrind
--leak-check=full
for a deeper analysis.
For more details, see the full documentation for Valgrind.
Keywords
These words are all reserved for C. Don’t name things with the same name:
auto
, break
, case
, char
, const
, continue
, default
, do
,
double
, else
, enum
, extern
, float
, for
, goto
, if
,
int
, long
, register
, return
, short
, signed
, sizeof
,
static
, struct
, switch
, typedef
, union
,
unsigned
, void
, volatile
, while
The fact that this list is so amazingly short is the good news in C! Enjoy!