"C is quirky, flawed, and an enormous success."

— Dennis Ritchie

These notes are not a complete tutorial or reference. They are a useful collection of important topics for someone who has programmed in C but might be rusty. The idea is that the material here can get you started with a programming project quickly.

Contents

Compiling

C is a compiled language meaning that it’s source code needs to be translated into something the computer can understand (in its entirety) before it is actually run. The "compiler" does this.

Then typical way to compile a program looks like so:

gcc -o typical typical.c

If you don’t specify a -o option (for output) your executable program will be named a.out which is not terrifically useful. It’s best to not do too much of that or you’ll have one a.out overwriting another.

In the old days compiling C programs on a Linux system was kind of a giant pain. These days things tend to work much smoother but for reference, I’ll include some notes on things to try to solve typical compile issues.

  • Use -D_GNU_SOURCE early and often. Modern Linux systems seem to come with a gcc that is aware that it’s a Linux system and does the right thing, but it wasn’t always so. If you want some more serious detail on programming in Linux specific environments, this seems like a good resource.

  • If include (.h) files are "lost" try an option like -I/usr/X11R6/include/X11/magick/ which can provide hints where to find include files.

  • Math not working even though you added a #include <math.h>? Maybe only some of math ( undefined reference to "floor")? Try -lm which often fixes that. I do not understand the logic of this requirement, but sometimes it solves these problems.

  • Are you nuts and compiling something against Xlib? You might need something like this: -L/usr/X11R6/lib -lX11

  • It turns out that the order of your gcc options and arguments is important. This nice web page points out that external libraries should be to the right of the thing that calls them. This is why ` gcc square.c -o square -lGL -lGLEW -lglut` works but gcc -lGL -lGLEW -lglut square.c -o square will produce tons of square.c:(.text+0x49): undefined reference to... errors. This drove me crazy until I figured it out. A little change in my Makefile and suddenly, everything was wildly broken.

  • There is a command called pkg-config which can figure out what compiler flags one needs for a particular objective. For example to figure out what is needed for SDL2 try pkg-config --cflags --libs sdl2. This can be put in make files to increase chances that they’ll work on alien systems.

If you’re curious about the resulting executable, it can be analyzed with readelf -a myprog. See this nice article on analyzing Linux executables for details and hints. Also, nm lists symbols from object files. In fact GNU Binutils is full of useful stuff.

Preprocessor

Including Libraries

#include "file_in_this_directory.h"
#include "/an/explicit/path.h"
#include <look_in_the_normal_place.h>

There are many useful functions in standard libraries. It looks like Wikipedia has a pretty good list of Posix C libraries. This is a specification of libraries a sane system should provide. Here are some of the classic ones with some of the defined functions listed.

stdio.h

Includes the super important printf. It also includes stddef.h. Also fwrite, fread, fprintf, fputc, putc, putchar, ungetc, fflush, fopen, freopen, fclose, remove, rename, rewind, FILE

math.h

Pretty much anything involving the eponymous topic of math. (Don’t forget -lm when you compile!). Here are some useful ones: ceil (nearest whole above), exp, floor (nearest whole below), round (proper rounding at .5), pow, sqrt. And most of the others: acos, asin, atan, atan2, cos, cosh, fabs, fmod, frexp, ldexp, log, log10, modf, sin, sinh, tan, tanh.

stddef.h

size_t, offsetof, NULL

stdlib.h

exit, abort, assert, perror, atexit, getenv, system, malloc, calloc, realloc, free, atoi, atol, atof, strtod, strtol, strtoul, rand, srand, qsort, bsearch Here’s a notable use: char *u; u= getenv("USER");

ctype.h

isalnum, isalpha, isdigit, isxdigit, isgraph (visible character), isprint (printable character), isupper (case), islower, iscntrl, ispunct, isspace, tolower, toupper

string.h

strlen, strcpy, strncpy, memcopy, memmove, strcat, strncat, strcmp, strncmp, memcmp, strchr, strrchr, memchr, strcspn, strpbrk, strspn, strstr, strtok, strerror, memset

unistd.h

Includes the getopt function. Also read, pread, pwrite, chown, chdir, getwd, exec, nice, exit (and stdlib.h!), getuid, fork, link, symlink, unlink, rmdir, getlogin, gethostname, chroot, sync, encrypt.

locale.h

setlocale, localeconv

time.h

asctime, ctime, clock, difftime, gmtime, localtime, mktime, time, strftime

signal.h

Defines functions and MACROS for handling signals. signal, raise also SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, SIGTERM

Preprocessor Macros

All kinds of mischief can be had with preprocessor tricks. Generally it seems that this should only be used to manage the software development aspect of the program and not the program’s actual functionality. Here’s an example of a preprocessor macro in use:

Preprocessor Macro Example
#include <stdio.h>
#define TYPE(T,V) T V;printf(#T"= %d\n", sizeof V);
int main(void){
    TYPE(char,a_char)
    TYPE(short int,a_short_int)
    TYPE(int,an_int)
    TYPE(unsigned int,an_unsigned_int)
    TYPE(long,a_long)
    TYPE(unsigned long,an_unsigned_long)
    TYPE(float,a_float)
    TYPE(double, a_double)
    TYPE(long double,a_long_double)
return 0;}

To see what this preprocessor macro does, see below.

Other preprocessor tricks would be stuff like general constants that should be flexible depending on how someone might want to compile the program:

#define PRECISION .0001

Often there is a big maze of include files and it’s easy to have one place include a library and then another place try to do it too resulting in some kind of clash. The following checks to see if the special library has been loaded and if not, it loads it. Subsequent uses of this will be ignored.

#ifndef SPECIAL
#include "special.h"
#define SPECIAL
#endif

Another time this is useful is to disable chunks of code that might be present for testing.

#define FROM_PCAP_FILE
#ifdef FROM_PCAP_FILE
 ... Use the save file input ...
#else
 ... Use the device input ..
#endif

This particular example is shown more fully at my libpcap notes.

Debugging With Preprocessor Tricks

Preprocessor macros can be useful for debugging messages too.

#define VERBOSE 4
...
if (VERBOSE > 2) {printf("A level 2 message.");}

Just set the value to 0 to turn off verbose messages. This allows the programmer to set up a bunch of diagnostic print statements that can be turned on or off easily.

Similarly you can "comment out" large blocks of code by wrapping it in a #ifdef block.

#define PROBE2
#ifdef PROBE1
/* It can be annoying to comment out code that already has comments. */
void processPacket(u_char *userarg, const struct pcap_pkthdr* pkthdr, const u_char *p){
    for (cc=52;cc<1500;cc+=536) { printf("Test: %d\n",p[cc+1]*256+p[cc]); }
    return; }
#endif
#ifdef PROBE2
void processPacket(u_char *userarg, const struct pcap_pkthdr* pkthdr, const u_char *p){
    for (cc=0;cc<pkthdr->len;cc++) { printf("%d ",p[cc]); }
    return; }
#endif

With this you can also #define settings as a GCC option with -DPROBE2 but that just acts as a define at the beginning of your code so subsequent defines will override this.

While there is no shortage of tricksy ways to use the preprocessor for debugging, it seems to come down to sticking to a few guidelines.

  • Let the compiler see the debugging code so that any future warnings are caught so you’re not taken by surprise when you turn the debugging on 20 years in the future.

  • Keep it simple.

#define DEBUG
#ifdef DEBUG
 #define D
#else
 #define D for(;0;)
#endif

Debugging Preprocessor Problems

Sometimes the problem is with the preprocessor swamp and you need to get some idea about what’s going on. For example, maybe you’ve got an #ifdef but you don’t know if it is getting hit.

#if CV_VERSION_MAJOR >= 4
#warning "CV_VERSION_MAJOR >= 4 IS TRUE"
#include "constants_c.h"
#endif

This is a compile time message. Use #error "message" if you want it to actually halt compilation.

Template Functionality In C

Related to the hackish messes one can make with the preprocessor are C++ templates. How can template functionality be done in plain C? This excellent article explores several different methods, mostly using the preprocessor but also with fancier techniques. So if you love templates, that’s no reason you can’t use C!

Idiosyncratic C Operators

x++

Increments variable x by 1 after using it in this spot.

++x

Increments variable x by 1 before using it in this spot.

x--

Decrements variable x by 1 after using it in this spot.

--x

Decrements variable x by 1 before using it in this spot.

{test}?{true}:{false}

An "if" statement for saving punch card chad.

x,y

Evaluate and discard x, evaluate and retain y.

x=y=z=3

All of the variables x,y,z are 3. Assignment is an expression. This works left to right.

Bitwise

&

bitwise AND

|

bitwise inclusive OR

^

bitwise exclusive OR

<<

bit shift left

>>

bit shift right

~

bitwise NOT

Logical Operators

There are logical operators too.

  • || Logical OR.

  • && Logical AND.

Here’s an explicit run down of how they behave.

printf("%X\n", 0x00&0x00 );  /* 0  */
printf("%X\n", 0x01&0x00 );  /* 0  */
printf("%X\n", 0x01&0x01 );  /* 1  */
printf("%X\n", 0xFF&0x01 );  /* 1  */
printf("%X\n", 0xFF&0xFF );  /* FF */
printf("%X\n", 0x00|0x00 );  /* 0  */
printf("%X\n", 0x01|0x00 );  /* 1  */
printf("%X\n", 0x01|0x01 );  /* 1  */
printf("%X\n", 0xFF|0x01 );  /* FF */
printf("%X\n", 0xFF|0xFF );  /* FF */
printf("%X\n", 0x00&&0x00 ); /* 0  */
printf("%X\n", 0x01&&0x00 ); /* 0  */
printf("%X\n", 0x01&&0x01 ); /* 1  */
printf("%X\n", 0xFF&&0x01 ); /* 1  */
printf("%X\n", 0xFF&&0xFF ); /* 1  */
printf("%X\n", 0x00||0x00 ); /* 0  */
printf("%X\n", 0x01||0x00 ); /* 1  */
printf("%X\n", 0x01||0x01 ); /* 1  */
printf("%X\n", 0xFF||0x01 ); /* 1  */
printf("%X\n", 0xFF||0xFF ); /* 1  */

Main Structure Of A C Program

A C program is a collection of functions, routines that possibly take some input and possibly return some output. All C programs that run must have one and only one function called main.

Here is a typical structure showing how the main function can be passed the command line arguments. This program is useful for diagnosing exactly what the C program is receiving from the executing shell. It also shows the polite return code (0 is usually success and 1 is usually failure while other numbers can signify fancy modes of failure or other things).

Accessing Command Line Arguments
/* Comments look like this! */

#include <stdio.h>

int main(int argc, char *argv[]) {
    int i= argc;
    for (i-1;i+1;i--) { printf("Argument #%d:%s\n",i,argv[i]); }
    return 0;
}

Or here’s a less readable version:

#include <stdio.h>
int main(int argc, char *argv[]){ while (argc--) printf("%s\n", *argv++); }
Note
To only show the arguments and not the program name (element 0), just make both of the postfix modifiers (-- and ++) into prefix modifiers.

If you don’t care about the command line arguments use something like:

int main(void) { /*code goes here*/; return 0; }

There is also a third parameter you can assume a main function to have and this is the envp[] which contains the environment variables. The Bash man page says, "When a program is invoked it is given an array of strings called the environment. This is a list of name-value pairs, of the form name=value." These are the ones the shell has, generally, marked with the export command (or by inheriting from parent processes).

This code will show all the environment variables a program might know about when run from a particular shell; the env utility does this too (when run without arguments).

#include <stdio.h>
void main(int argc,char *argv[],char *envp[]) {
    for (int i=0;envp[i]!=NULL;i++) printf("\n%s", envp[i]); }

Note that these are strings in the format of USER=xed where the first = is the one the separates the variable name from the value. Obviously this means that having = as part of your variable name is so stupid, it is probably not allowed.

Types

C is a "strongly typed" language meaning that it carves out memory for various purposes based on very explicit definitions of the resources which will be used. Important types:

int

Integer, i.e. non fraction whole numbers.

float

Numbers that can represent a continuous value (to the accuracy a binary representation ultimately provides).

char

A character.

enum

An enumeration. Used to create a type with a constrained set of possible values. enum lightswitch {Off, On}; Here lightswitch can be either "Off" or "On" which is the same as 0 and 1. If you wanted different values, use something like enum lightswitch {Off=-1, On=1};

union

Define with something like: union Lights { int Switch; float Dimmer;} This allows one thing (Lights) to either have an int value if it’s just a switch and a float value if it’s a dimmer. It’d be good to keep a separate variable around to store which you’re using at any given time or confusion will result.

struct

A structure. Used to create custom types that hold collections of things. struct point { int x; int y; int z}; To declare a variable of this type you need to do struct point LastKnown;

Custom

Sometimes you want to make some complex named type have a simple name. To do that use the typedef statement: typedef short int twobyter; Now declaring something as twobyter X= 0 is the same as saying short int X=0; This can be a handy trick when setting up arbitrary data structures that may find utility handling different payloads. Just typedef the data component of the complex structure and tailor that to your needs at the beginning of the program.

The various types require different amounts of storage in memory. It is best to choose the most economical type which satisfies requirements. This is a nice feature to be able to optimize in this way, however, since it is not optional it is also one of those pains that makes C programming a bit tedious. Here is the output of the preprocessor example above which shows the size in bytes of various storage types on my machine.

char= 1
short int= 2
int= 4
unsigned int= 4
long= 4
unsigned long= 4
float= 4
double= 8
long double= 12

const

Unlike the much worse situation with C++ the situation with const in C is only extremely confusing and prone to error. Basically if you’re really, really serious about something being really truly immutable, you should just not create a "variable". Just literally say the thing every time. So if you want to use the value for pi, just say "3.14159265358979323846" everywhere. That is a lot to type and prone to error, so C conceptually provides a way to do this that’s about as clever as using sed — just use preprocessor macros.

#define PI 3.14159265358979323846

The const specifier is for things where you don’t know what it will be, but once it is initialized, it will never change. For example, it can be applied to things like this.

int main(const int argc, char** argv){  ...

Here the number of arguments will be different depending on how the program is executed, but once the value is established, argc will be locked for the duration of the execution. Without out const you could, say, decrement the argc as you dealt with each argument; as shown, you can not.

Things get icky when slapping a const on pointers. This means that the pointer is constant but the stuff that it points to is not.

char * const a;

So here we’re saying that a always points to some address &a and never anywhere else no matter what.

This order has a different meaning.

const char * b;

Here the pointer is not const and in theory could change to point to a new address location. However, in both of those locations, you’ll want to be finding a constant character. If you try to change that character, i.e. the contents, it will fail. So basically you can’t do anything like this.

*b= 'x'; /* Fails. */

To parse these tricky declaration puzzles, start at the variable and spiral outward starting to the right (e.g. "b is a pointer to a char which is constant" or "a is a constant pointer to a char"). Here is a very good web resource for untangling this mess called "C gibberish → English".

printf Format Codes

The return value for printf is the number of characters written which can be handy.

The format specifier has this form:

% [flags] [field_width] [.precision] [length_modifier] conversion_character

Here are the modifier flags:

-

left justified

+

always mark a + or - for signed numbers

<space>

use - for negative numbers and space for postive

0

pad leading zeros to specified width

#

Modifies style of each type. For example, for x types, there will be a prefix 0x, for X prefix 0X. For [gG] trailing zeros will be included. For [eEfFgG] all output will have a decimal point.

Field width is the minimum space that will be used if the output is shorter than the value provided.

Precision is the number of digits after the decimal point in numbers with decimals and the number of total digits in others. In [gG] it is the number of significant digits.

The length modifier is h (short or unsigned short), l (long), or L (long double).

The conversion character is one of the following:

d or i

int signed base10

o

int unsigned octal

u

int unsigned base10

x and X

int unsigned hex specifying the x’s case

c

single char (like an unsigned int)

e and E

double or float in scientific notation specifying the e’s case

f

double or float base10

g and G

either like e or f depending on size

n

the argument is a pointer to which the number of characters converted thus far is assigned. Nothing is output.

s

output a string, i.e. a pointer to a char. Characters are output until a \0 is encountered or the number of the precision specifier has been reached.

p

output an implementation representation of a pointer - use for debugging?

%

output an actual "%"

Bit Masks

Basically you can store a set of many boolean variables in one C variable. Since you’re going to need a minimum of 8 bits to do about anything, the theory goes that if you’re just needing to store 1 bit, you might as well have that state variable serve multiple purposes. Despite sounding horrible, this actually produces code that is surprisingly readable.

The basic technique is to set the various flags with bit shifts to store them in the right places. Then when you want to create a state collection, just "or" them together with |. When you want to check to see if a flag is set in a collection, just use & to get that back out. Here’s an illustrative example:

/* An example of how to use bitmasks. */
#include <stdio.h>
#define LIGHTS_ON        ( 1 << 0 )
#define BRAKE_LIGHTS_ON  ( 1 << 1 )
#define WIPERS_ON        ( 1 << 2 )
#define HORN_ON          ( 1 << 3 )

int main(int argc, char *argv[]) {
    unsigned int car_status;
    car_status= LIGHTS_ON | WIPERS_ON | BRAKE_LIGHTS_ON;
    if (car_status & LIGHTS_ON) {
        printf("Lights on.\n");}
    if (car_status & BRAKE_LIGHTS_ON) {
        printf("Brake lights on.\n");}
    if (car_status & WIPERS_ON) {
        printf("Wipers on.\n");}
    if (car_status & HORN_ON) {
        printf("Horn on.\n");}
    return 0; }

This program will produce this result:

Lights on.
Brake lights on.
Wipers on.

Note that this technique is not especially type safe since a function expecting a well crafted collection of bits can be sent any old value that works and the compiler won’t notice. This is one reason that in C++ bool types are more robust. But bitmasks can be useful and they definitely pop up a lot in various libraries; understanding how they work is important.

Pointers

Objects in C can be handled by their names (which imply their contents), but a far more powerful and flexible technique is to work with them by only specifying the address where the data of interest is. The reason for this is that it’s computationally expensive to shuffle things around in memory if you don’t really need to. It’s better to leave the bulk of the thing alone and just refer to it where it is needed. It’s a bit like money. You could trade gold specie for the things you want, but for most transactions, it’s easier to leave the gold in a vault somewhere and just trade promissory notes referring to it. (Assuming a gold standard) writing a check is like referring to a reference (bank notes) to actual money (the gold). This is like a C pointer’s ability to point to a pointer.

So if you have a variable called big_thing with a lot of data in it, you can do things with that variable by name, but sometimes it is more effective to just refer to the location where that thing lives. It’s quite like addresses in real life: you don’t have to specify the exact nature of a house at a particular address or if it’s a strip mall or whatever, just the address is sufficient to deal with it for many purposes.

Important ideas with pointers:

  • Pointers are a data type that holds exactly one memory address. What that address actually is should seldom ever be of concern.

  • Pointers can point to other pointers.

    int x;

    Defines an integer type called x.

    int *ptr2x;

    Reads "Define the thing ptr2x points to as an integer." This (*) is technically called the "indirection operator".

    ptr2x= &x;

    Reads "Set ptr2x to the address of the object defined by x."

    p->n= 0;

    Sets to zero the subcomponent n in the structure that pointer p points to. This is technically called the "indirect member access operator".

Void Pointers

Untyped pointers can be created with the void type.

void *anyptr;

To dereference such a pointer, it must be type cast with something. In this example the contents of two different kinds of variables, ib and fb, are set from the dereferencing of the same pointer.

#include <stdio.h>
void main() {
    int ib,ia= 666;
    float fb,fa= 3.14;
    void *anyptr;
    anyptr= &ia;
    ib= *((int*)anyptr);
    printf("ib now is: %d\n",ib);
    anyptr= &fa;
    fb= *((float*)anyptr);
    printf("fb now is: %.2f\n",fb);
}

Arrays

A[i] is the same as (*((A)+(i)))

So these are equivalent.

A[4]= 'x';
*(A+4)= 'x';

You can load the array at definition.

float origin[3]= {0,0,0};
char mystring[]= {'x','e','d','\0'}; /* The '\0' makes it a "string". */
char mystring[]= {"xed"}; /* Equivalent. */

I’m pretty sure you can’t define the array and then later set it with {0,0,0} or something like that. Just keep in mind why strcpy and memcpy exist. Here’s a way to use memcpy to initialize an array.

int colpos[MAXNUMFIELDS];
memset(colpos,0,sizeof(colpos));

Here are ways to initialize arrays that are possibly compiler specific.

int colpos[MAXNUMFIELDS]={[0 ... MAXNUMFIELDS-1]=0}; // Works on gcc!
int colpos[MAXNUMFIELDS]={0};                        // Works on gcc!

Brackets are actually a postfix operator for manipulating the array specified by the operator. (Confusing? Yes.)

Elements of arrays are stored in successive pointer address locations.

&origin[1]-&origin[0] == 1

Find the length of an array:

int length= sizeof origin / sizeof origin[0]
Note
In case of confusion, note that *argv[] is the same as **argv.

Multi-Dimensional Arrays

Both of these syntax styles work.

int a[3][4] = {
   {0, 1, 2, 3} ,   /* a[0][0],a[0][1],a[0][2],a[0][3] */
   {4, 5, 6, 7} ,   /* a[1][0],a[1][1],a[1][2],a[1][3] */
   {8, 9, 10, 11}   /* a[2][0],a[2][1],a[2][2],a[2][3] */
};
int b[3][4] = {0,1,2,3,4,5,6,7,8,9,10,11}; /* Same as a. */

It looks like the inner braces are for decoration only.

Variable Length Arrays

Hard to know whether to include this one here or at dynamic memory but it seems that the old limitation of needing to preallocate array memory has been relaxed a bit with C99. Official GCC documentation about arrays whose size can be specified at runtime.

Here’s a sample that demonstrates this crazy thing actually works.

#include <stdio.h>
#include <stdlib.h>
int use_a_variable_length_array(int n) {
    float vals[n];                  /* No idea how big it might be! */
    for (int i=0;i<n;i++) { vals[i]= 0.0; } /* Yes, it can be used! */
    return n;
}
int main(int argc, char** argv) {   /* Use input to set array size. */
    printf("%d\n",use_a_variable_length_array(atoi(argv[1])));
    return 0;
}

Running the command with an argument of 1000000 does work fine. However, running the command with an argument of 10000000 produces an unhelpful "Command terminated" message.

If you use the GCC option -Wvla then this code will produce a warning (but otherwise compile fine): "warning: ISO C90 forbids variable length array ‘vals’ [-Wvla]"

Some people believe that VLAs are a performance drag.

Some security people believe that it allows you to exhaust the stack and then write naughty security problems into memory and then jump to them and take over the operations. Apparently they have been aggressively weeded out of the Linux kernel.

Linus himself says: "…using VLA’s is actively bad not just for security worries, but simply because VLA’s are a really horribly bad idea in general in the kernel."

There’s even a GCC thing where you can put variable length arrays in structs. Here is what Linus has to say about that. "The feature is an abomination. I thought gcc only allowed them at the end of structs, in the middle of a struct it’s just f*cking insane beyond belief."

Use sparingly if you must.

Chars and Strings

An array of objects of the char type has some special syntactical properties in C. This is to facilitate the handling of "strings". The following are all valid syntax but they may have subtle differences.

char alphabet[26];
char theFword[4] = {'f', 'u', 'n', '\0'};
char string[6] = "twine";
char gray[] = {'g', 'r', 'a', 'y', '\0'};
char salmon[] = "salmon";

The first thing to note is that this isn’t Python — single and double quotes are not the same! Use single quotes around single characters and double around strings.

The next important confusion to clear up is that there are two kinds of strings. Here’s a look at the two types of string-like memory structures.

Character Arrays

A character array is just what it sounds like with no serious mystery.

char myarray[50];

This produces a memory reservation, on the stack, for 50 char type objects addressable by index just as an array should be.

myarray[5]= 'c'; /* No problem with that. */

You can even do something like this to initialize it in the declaration.

char myarray[50]= {'x','e','d','.','c','h'};

That fills the first 6 with characters but leaves the last 44 undefined (use memset if that’s a problem). This does the exact same thing.

char myarray[50]= "xed.ch";

There a string literal is used, temporarily, to provide the character constants to define the array.

The confusing part about this is that you can not redefine the string in an obvious way that looks very similar but is not.

char myarray[50]= "xed.ch";
myarray= "This will be an error!";

You can pick at each member of the array with myarray[0]= 'T' and so on.

If you omit the size to initialize to, it is computed from the length of your string literal initializer.

char myarray[]= "xed.ch";
printf("%d\n",sizeof(myarray)); /* Will return 7, not 6! */

Note that it adds one more to store the NUL termination string it thinks you will appreciate. If you provide a length (like [50] shown above) then it will assume you’ll fit your full string and the nul terminator in the 50, and sizeof will return 50. So if you need the alphabet to be used in string functions, you need this.

char abc[27]= "abcdefghijklmnopqrstuvwxyz\0";

Or simply do this which is the same thing.

char abc[]= "abcdefghijklmnopqrstuvwxyz";

Like arrays in general myarray is the same as &myarray. In other words, it is just an address where some same-sized chunks of stuff are allocated.

Compilers are weird but in theory, this kind of character is stored on the stack where it is accessible to changes.

String Copying (strcpy)

Other programming languages do a really good job of hiding the gory details of using strings. C does a really good job of not hiding such details. This can be a challenge until you get the hang of it. A common scenario is you want to preserve a copy of a string in case the primary copy gets mutated. You can’t just say something like char *stmp=s; and think you’re done! The new string must be declared and its size must be explicitly provided so that memory can be carved out to store it. Here is a way that has worked for me.

char *stmp= malloc(strlen(s) + 1); strcpy(stmp,s);

This copies the contents of s into the stmp variable which has just been created and explicitly set up with the same amount of memory as s.

String Literal

This syntax defines a pointer to a string literal.

char *myptr= "xed.ch";

An ordinary string literal has a type "array of n const char". Note the const; this means that this kind of string is stored in a way that modification is not correct. It might compile in weird cases, but don’t do it!

The underlying storage, unlike a character array, is in some kind of read only memory. All you can do is change what the pointer points to. This is ok for many applications because you often want to replace a string with another. Though this smells like the string is mutable, you are really changing the variable pointer to point to a different constant array. Using the myptr definition above this now works.

myptr= "Some new text is ok now!";

This implies that myptr and &myptr are not the same the first is a pointer to the latter’s array address (where the read only array really lives). You can’t reach into the array with the pointer index because it’s the wrong pointer. So this does not work.

myptr[23]= '.';

Misc String Functionality

If you want fancy string capabilities, you might need a custom library to do what you want. Here is an interesting one.

Need to find out how long a string is or if it is not empty, use strlen() which is in string.h.

if (strlen(err)){printf("%s\n",err);exit(1);}

Branching

Basically computers compute by making logical decisions. In C, the main decision making feature is the if statement:

if (test_expression) {statement_block} else {statement_block}

The else if construction allows a single choice to be made from a series of possible conditions.

if (te1) {sb1} else if (te2) {sb2} else if (te3) {sb3} ... else {sb}

For if statements, the test expression can be anything that reduces to an integer which equals 0 (which is false) or something else (which is true).

A fancier form of branching can be done with the switch and case statements. Here’s how it works:

An Example Of switch/case And getopt
#include <stdio.h>
#include <unistd.h>

int main (int argc, char **argv ){
static char optstring[]="a:b:c"; int o;
while ( (o = getopt(argc, argv, optstring)) != -1)
 switch(o) {
     case 'a': { printf("Option argument for `a` is: %s\n",optarg); break; }
     case 'b': { printf("Option argument for `b` is: %s\n",optarg); break; }
     case 'c': {  printf("Option `c` has no argument.\n"); break; }
     default: { printf("Option `%c` is unknown.\n", o); }
 }
return 0;}
Note
For a more comprehensive example of option parsing, see the Option Parsing section.
A Simple Example Of switch/case
switch (offset) {
    case 0:    {f|= 0x1; break;}
    case 185:  {f|= 0x2; break;}
    case 370:  {f|= 0x4; break;}
    case 555:  {f|= 0x8; break;}
    default: { printf("Packets with wrong offsets (%d) are being captured!!",offset);}
}

Here I’m trying to set 4 flags as some fragmented packets come in. These packets should have well defined offset values (0,185,370,555) and when each shows up, the flag is set. If some other offset shows up, that’s weird and the default condition handles it. If the value of f is 0b1111/0xF/15d when I’m done, then I know all the pieces have arrived.

Another trick to keep in mind with switch and case is that if you don’t include a break all the subsequent commands will be executed until one is found or the switch block is ended. This leads to frustrating errors if you simply forget, but intentionally, this can be used to get two options to do the same thing.

switch (animal) {
    case snake: {snake_stuff; break;}
    case tiger: { /* Same as next one. */ }
    case lion:  {cat_stuff(); break;}
}

I’m not sure if the braces on tiger are even required. Maybe it needs a semi-colon. But this works.

Looping

Interesting software is a result of many logical decisions being repeatedly performed in interesting ways. The main way to achieve multiple iterations of an action in the C idiom is with the for loop:

for ({initial};{test_before_each_iteration};{eval_after_each}){thing_to_do}

Here is a very common idiomatic usage.

for (int i=0;i<SIZE;i++) { printf("%d\n",a[i]); }

Here’s a more interesting example:

for (hi=100,lo=60;hi>=lo;hi--,lo++){converge(hi,lo);}
Note
You need to define the variables that appear in the for statement prior to using it. If that really bugs you, you can try compiling with -std=c99 but that seems kind of non standard to me in some slight way. The less compiler magic, the better IMO.

The other two important loop structures are similar with a subtle difference. These are the while loops. The most basic works like so:

while ({test_expression}) {do_this_stuff}

If before any attempt to execute the body of the loop the test expression is 0 or NULL then the loop is skipped and control is passed on.

Note that this can also be expressed with for.

for (;{test_expression};) {do_this_stuff}

If you want the test evaluated after the loop body code is run (which implies the loop body will always run at least once) use this form:

do {do_this_stuff} while ({test_expression});

Exiting loops

continue

This statement jumps control to the end of the current loop body statement as if it had completed an iteration and was now ready for more. It allows for short circuiting some code that might otherwise be performed on every iteration.

break

This statement jumps control just past the end of the current loop body statement as if the last iteration had just occurred and finished. This statement basically says that this looping structure is completely finished, not just this iteration.

return [expression]

This is the way to break out of a function. The optional expression is passed back to the calling function by value (so use pointers where that’d be unpleasant). If the function was defined as void then don’t include an expression. A function can have several return points depending on the situation. If you have stdlib included, you can return EXIT_SUCCESS or EXIT_FAILURE.

Dynamic Memory

Anytime you are working with an amount of data that you can not explicitly define an upper bound on ahead of time, you probably need to use dynamic memory. The main mechanism of dynamic memory is the malloc() function (include stdlib.h) which runs around looking for enough contiguous memory to reserve for some run time defined purpose. Once malloc() finds the memory you’ve requested, it returns a pointer to that location so you can start doing stuff with it. The format for using malloc() is a bit fussy:

p= (struct Thing *) malloc (sizeof (struct Thing))

Here the sizeof() function returns exactly the value (in bytes) for just how much memory an instance of struct Thing would need. That memory is is reserved and the pointer that is returned is cast (forced) by the first parentheses to point to memory that is configured as a struct Thing.

When your program is finished with some memory that has been allocated, it’s polite (or maybe even critical) that it be returned to the system for use. The way to do that is with the free() function which takes a pointer to the memory you want recycled.

Note
If you’re really interested in C’s low level memory management, here is an interesting guide to writing your own malloc and friends.

Simple Stack Implementation

Before C can be made into anything useful, you really need to create some tools to make certain tasks easier to implement. One theme that comes up over and over in more substantial programming tasks is the need to hold an arbitrary bunch of data somewhere. Since C requires very explicit declarations of all memory used, this can be challenging to always attend to it. It is therefore useful to create some templates that can get you into more interesting parts of the problem quickly.

Here is an implementation of a simple stack system. The stack is fed data with a Push() command, that is data is appended to the end of the stack (a FILO queue). Data is retrieved (and removed) from the stack with a Pop() function. Note the type definition cargo_type allows the stack to carry whatever kinds of data types you want simply by redefining this.

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

/* Custom type definitions. */
typedef int cargo_type;
struct linkbox { cargo_type cargo; struct linkbox* next;};
typedef struct linkbox lbox;

/* Function prototypes show inputs and outputs so subsequent */
/* mentions of them aren't confusing (seemingly undefined)   */
/* to the compiler.                                          */
void Push(cargo_type v, lbox** p2mylist);
cargo_type Pop(lbox** p2mylist);
cargo_type Iter(lbox** current);
int dice(int sides);

int main(void){
    int i,m;
    srand(time(NULL)); m= dice(20);
    lbox *mylist=NULL;
    for (i=0;i<m;i++){
        Push( dice(6), &mylist); }
    lbox *index= mylist;
    int sum=0, n=0;
    while (index) {
        sum += Iter(&index);
        n++;
        /*printf("Iter:%d\n",Iter(&index));*/ }
    printf("Average:%f\n",(float)sum/n);
    while (mylist) {
        printf("Popping:%d\n",Pop(&mylist)); }
    return 0;}

int dice(int sides){
    return rand() % sides + 1;}

cargo_type Iter(lbox** c){
    cargo_type t= (*c)->cargo;
    *c= (*c)->next;
    return t;}

void Push(cargo_type v, lbox** p2mylist){
    lbox* latestbox;
    latestbox= (lbox *) malloc(sizeof(lbox));
    latestbox->cargo= v;
    printf("Pushing:%d\n",v);
    latestbox->next= *p2mylist;
    *p2mylist= latestbox;
    return;}

cargo_type Pop(lbox** p2mylist){
    cargo_type t= (*p2mylist)->cargo;
    lbox *dead= *p2mylist;
    *p2mylist= (*p2mylist)->next;
    free(dead);
    return t;}
Linked List Example Memory Layout

Linked List Memory Map

This program is also an example of passing function arguments by reference. It needs a pointer, so the pointer is pointed to by another pointer which gets sent to the function. When the transporter pointer is dereferenced, the original pointer that was supposed to show up at the function is ready to go. The reason this is necessary is that C function arguments are copied over and if you copy a pointer, it’s a different pointer (even if it points to the same place). If you inserted a new node between a function copy of the pointer to the list and the list, then you’d lose track of the (complete) list when the function variable’s memory was freed on function exit.

File Operations

After being able to allocate memory you need you often need to use the file system to read actual data to fill that memory. Using files is a fundamental operation that has its quirks in C. The following example reads, character by character, a file called ./fileio.c and prints it to the output, and writes it to a file called /tmp/fileio-copy.c. The hard to memorize bits are including stdio.h and creating a FILE pointer. Also opening and closing the file require fopen and fclose.

Simple Example Reading And Writing Files
#include <stdio.h>
int main(int argc, int *argv) {
    FILE *fpi,*fpo;
    fpi= fopen("./fileio.c","r");
    fpo= fopen("/tmp/fileio-copy.c","w");
    char curchar;
    curchar= fgetc(fpi);
    while (curchar != EOF) {
        printf("%c",curchar);
        fputc(curchar,fpo);
        curchar= fgetc(fpi);
    }
    fclose(fpi);
    fclose(fpo);
    return 0;
}

While I’ve shown fgetc and fputc, other possibilities include fprintf and fscanf. Also fread and fwrite (for binary).

It’s also worth pointing out that a more C styled way to do the main read loop would probably be something like:

while  ( (curchar= fgetc(fpi)) != EOF ) {...}

Here is a very simple test of file writing. This can help you determine if file writing is slow. I just writes a different number to a certain file a million times. It renames so that the operation is atomic - anything reading the files should get he entire number without worrying it will get cut off.

iotest.c
#include <stdio.h>
int main(int argc, int *argv) {
    FILE *fp;
    unsigned int c= 0; /* Counter. */
    while (c++<1e6) {
        fp= fopen("/dev/shm/iotesting.tmp","w");
        fprintf(fp,"%d\n",c);
        fclose(fp);
        rename("/dev/shm/iotesting.tmp","/dev/shm/iotesting");
    }
    return 0;
}

Running this on a decent desktop, I got 83k atomic write operations per second. On my 2009 laptop, I got 31k. On a Raspberry Pi 4 I got 17k. Of course (on the desktop) I got write 7.5 million (non-atomic) write operations per second when I kept the file open the whole time.

I think that one of the best ways to read in data is fgets. Here’s a pretty solid way to do that using dynamic buffers that grow if needed using realloc.

./revtac ./revtac.c | ./revtac
/* Here's an example of using realloc to grow the buffer to as much as
 * needed when bringing in data. This particular example will take the
 * specified file, or standard input, and render it backwards. Imagine
 * rev and tac combined. Running this twice should cancel.
 * $ md5sum revtac.c <(./revtac ./revtac.c | ./revtac) */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[]) {
    FILE *fp;
    if (argc-1) fp= fopen(argv[1],"r");
    else fp= stdin;
    if (!fp) {perror("Could not open file."); exit(EXIT_FAILURE);}
    char *str= malloc(4096), *s= str;
    int len= 0;
    while (fgets(s,4096,fp)) {
        len += strlen(s);
        str= realloc(str, len+4096);
        s= str+len;
    }
    fclose(fp);
    int n;
    for (n=len++;n;n--){
        printf("%c",str[n-1]);
    }
    return(EXIT_SUCCESS);
}

The previous example had two limitations. First, because it needed to know the end of the input before beginning it’s output, it loaded the entire contents of the input into memory. This is not ideal for very big jobs where sequential processing can be applied. Second, it only handled one file. Proper Unix utilities should be able to accept data on standard input and/or as one or more files to open. The quintessential utility that reliably does this is cat. To show how to create a program which can use an arbitrary number of input sources like cat and address each line as they come, I have rewritten cat from scratch. Note that I am not Richard Stallman and I’m not claiming this is the most robust cat implementation ever, but if you need a program that does about the same thing as cat but with a bit of C code thrown in, this can be a better place to start than the source code for the real cat (which is also reasonable).

alleycat.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXLINELEN 666

void process_line(char *line) {
    // PUT THE ESSENTIAL LOGIC FOR THIS PROGRAM HERE!!
    printf("%d %s",(int)strlen(line),line); // For example: line length and line.
}

void process_file(FILE *fp){
    char *str=malloc(MAXLINELEN), *rbuf=str;
    int len=0, bl=0;
    if (str == NULL) {perror("Out of Memory!\n");exit(1);}
    while (fgets(rbuf,MAXLINELEN,fp)) {
        bl=strlen(rbuf); // Read buffer length.
        if (rbuf[bl-1] == '\n') { // End of buffer really is the EOL.
            process_line(str);
            free(str); // Clear and...
            str=malloc(MAXLINELEN); // ...reset this buffer.
            rbuf=str; // Reset read buffer to beginning.
            len=0;
        } // End if EOL found.
        // Read buffer filled before line was completely input.
        // Allocate more memory for this line.
        else { // Add more mem and read some more of this line.
            len+=bl;
            str=realloc(str, len+MAXLINELEN); // Tack on some more memory.
            if (str == NULL) {perror("Out of Memory!\n");exit(1);}
            rbuf=str+len; // Slide the read buffer down to append position.
        } // End else add mem to this line.
    } // End while still bytes to be read from the file.
    fclose(fp);
    free(str);
} // End function process_file

int main(const int argc, char *argv[]) {
    FILE *fp;
    int optind=0;
    if (argc == 1) { // Use standard input if not files are specified.
        fp=stdin;
        process_file(fp);
    }
    else {
        while (optind<argc-1) { // Go through each file specified as an argument .
            optind++;
            if (*argv[optind] == '-') fp=stdin; // Dash as filename means use stdin here.
            else fp=fopen(argv[optind],"r");
            if (fp) process_file(fp); // File pointer, fp, now correctly ascertained.
            else fprintf(stderr,"Could not open file:%s\n",argv[optind]);
        }
    }
    return(EXIT_SUCCESS);
}

Change the process_line function to do whatever it is you need to do to the data.

Option Parsing

When running programs from the command line, the main function can be supplied with a list of optional parameters passed from the calling program or shell. To properly parse this in a sensible way, C has some nice functions that help keep things consistent and error free. Here is an example of a complete option parsing routine which handles long options. Long options are like --help, --verbose etc., and tend to be popular with GNU utilities.

Example Of getopt_long
#include <stdio.h>
#include <getopt.h>
#include <stdlib.h>

int main (const int argc, char **argv) {
    int help= 0; int i=0; int j=10; float k= 0;
    int o;
    while (1) {
        static struct option long_options[] = {
            {"help"  , no_argument,       NULL, 'h'}, /* Bools work well in C++. */
            {"ivalue", required_argument, NULL, 'i'}, /* Integer arg required. */
            {"jvalue", optional_argument, NULL, 'j'}, /* Integer arg optional. */
            {"kvalue", required_argument, NULL, 'k'}, /* Float arg required. */
            {0, 0, 0, 0} /* must be filled with zeros */
        };

        /* getopt_long stores the option index here. */
        o = getopt_long(argc, argv, "hi:j::k:", long_options, NULL);
        if (o == -1) break; /* Detect the end of the options. */
        switch (o) {
            case 'h': help= 1; printf("Help=%d\n",help); break;
            case 'i': i= atoi(optarg); break;
            case 'j': if (optarg){ j= atoi(optarg); } else { j=99; } break;
            case 'k': k= atof(optarg); break;
            default: printf("Unknown Option.\n"); return 0;
        } /* End switch construct */
    } /* End while loop */

    /* State of variables initialized by options.  */
    printf("i=%d, j=%d, k=%f, help=%d\n",i,j,k,help);
    printf("Option Index: %d\n", optind); /* optind is defined by getopt.h */
    /* Print any remaining command line arguments (not options). */
    if (optind) { printf ("Non-option ARGV-elements: \n"); }
    while (optind < argc) { printf("%s \n", argv[optind++]); }
    return 0;
} /* End main */
Note
In the example program above the option -j (aka --jvalue) is defined as having an optional argument. Optional arguments (identified with ::) cause some ambiguity and to use them, you must run your program specifying these arguments like: -j99 or --jvalue=99. If you try -j 99 or --jvalue 99 then the 99 will be considered unattached to the option.

User Input

The gold standard for user input is the readline library. Originally a part of Bash, it is now a separate library found in all kinds of software.

To use the readline/readline.h library you need to install packages libncurses-dev and libreadline-dev.

#include <stdio.h>
#include <stdlib.h>
#include <readline/readline.h>
int main(int argc, char** argv){
    char* line= readline("cxesh> ");
    printf("Confirmation: \"%s\"\n",line);
    free(line); return 0;
}

Compile with the -lreadline flag.

gcc -o readlinetest readlinetest.c -lreadline

Time Functions

If you need to just throw a simple delay into a C program, including the unistd.h library will supply you with the sleep() function which takes an argument in seconds. Works pretty much like /usr/bin/sleep. From the same source comes usleep() which is the same thing with the argument divided by 1000, so 80 is not 80 seconds but 80 milliseconds.

Including time.h will get you a special type clock_t which can be filled by the clock() function. Need the date or a full human readable timestamp? This is basically how /usr/bin/date works.

#include <time.h>
...
   time_t now; char buffer[80];
   strftime(buffer,80,"%Y-%m-%d %H:%M:%S", localtime(&now));
   printf("ISO8601: %s\n", buffer );

Here is a decent place to start for timing functions.

Signals

Sometimes you need to hear from some other process. There are tons of ways to do this (pipes, semaphores, sockets, etc) but signals are very ancient and direct. They are often used in userspace to convey information like, "Stop now because you’re causing problems!" but they can be used for many other purposes.

Here’s a nice GNU reference of various signals. It starts with this nice list of why you’d want to use signals in the first place.

  • Program Error Signals - report serious program errors.

  • Termination Signals - interrupt and/or terminate the program.

  • Alarm Signals - indicate expiration of timers.

  • Asynchronous I/O Signals - indicate input is available.

  • Job Control Signals - support job control.

  • Operation Error Signals - report operational system errors.

  • Miscellaneous Signals - includes user signals for whatever you need.

Note that there are fine points to this topic and the signal(2) man page explicitly says "Avoid its use: use sigaction(2) instead." More nuanced thinking would suggest that for maximum ISO C portability signal is ok, while for best performance/safety in POSIX environments, use sigaction.

Useful Tricks

Associative Arrays

Also known as hashes, dictionaries (dict), Maps, etc. Although C does not have a first class data type for named arrays, the reason seems pretty clear to me — the creators of C just couldn’t imagine anyone dumb enough to need that. Of course they understood how terrifically useful such a feature is, but they also understood how trivial it would be to implement it from scratch using essential language features.

Why do I believe this? Go have a look at page 134 of the first edition of "The C Programming Language" by Kernighan and Ritchie, often simply called "The K&R Book" in C lore. There you will find section 6.6 called Table Lookup. In the two pages that follow, the motivation, and rationale for implementing associative arrays in true idiomatic C are provided. These two pages also include code for a complete implementation.

If that doesn’t humble you into focusing more on your deficiencies than the ones you perceive in C, I don’t know what will.

Here is an implentation and discussion, including alternatives. The specific hash function K&R use is really just for illustration and some people object to it; other reasonable looking hash functions can be found at the bottom of this page.

Random Numbers

To get a random number between 1 and 100 do something like this:

rand_from_time.c
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

int main (void) {
    srand(time(NULL));
    int mystery= rand() % 100 + 1;
    printf ("Random number from 1 to 100: %d\n", mystery);
    return (0); }

You need the srand() to seed the random number generator. The rand() function returns random numbers between 0 and RAND_MAX. If you need a random number between 0 and 1, another way to do that would be to do rand()/(RAND_MAX+1).

Warning
The method of seeding srand() with a time(NULL) function is ok in many situations, but remember that this can be reversed engineered. This means you don’t want to write a real-money gambling game that is randomized in this way. Also if you run the program quickly the time may be the same to within a second and this will cause the "random" output to possibly repeat itself.

If you are using a proper operating system (like Linux or a fruit-based computer) there is a managed resource that collects entropy for use by various processes in establishing randomness. This source of randomness is presented as a file by the kernel and automagically filled with pretty high quality random numbers (see man random for gory details). Here is a way to get random numbers using a seed pulled from this source:

rand_from_os.c
#include <math.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int main (int argc, char *argv[]) {
    FILE *urandom;
    unsigned int seed;
    urandom = fopen ("/dev/urandom", "r");
    if (urandom == NULL) {
        fprintf (stderr, "Cannot open /dev/urandom!\n");
        exit (EXIT_FAILURE); }
    fread (&seed, sizeof (seed), 1, urandom);
    srand (seed);
    printf ("Random number from 1 to 100: %d\n",
            (int) floor(rand() * 100.0 / ((double) RAND_MAX + 1) )+ 1);
    exit (EXIT_SUCCESS); }

A good illustration of the difference can be seen by running these numerous times very quickly. If run 10,000 times, a random number between (and including) 1 and 100 should pop up roughly 100 times. You can see that producing random numbers from the OS’s seed does roughly that. The time based one, however, does a terrible job. Most of the time it will produce zero results with a particular preselected number ("88" in the following example).

$ for x in `seq 10000`;do ./rand_from_os | grep ' 88$' ; done | wc -l
97
$ for x in `seq 10000`;do ./rand_from_os | grep ' 88$' ; done | wc -l
94
$ for x in `seq 10000`;do ./rand_from_time | grep ' 88$' ; done | wc -l
0
$ for x in `seq 10000`;do ./rand_from_time | grep ' 88$' ; done | wc -l
512

This is because over the course of a few seconds to run, the time only changes a few times and most of the values will be from only a handful of seeds. Ironically, this problem is worse on higher performance machines.

Permissions

Sometimes you’re doing something like writing images with libpng and it wants to create them with absurd restrictive permissions. To change this behavior, you need to add these lines.

#include <sys/stat.h>   /* umask */
umask(022); /* Set file creation permissions to RW for me and R for all. */

Then make sure you delete any files you may have previously written because this will not affect existing files, just creation.

Note that the open() function takes a umask argument which might simplify things in easy cases where you’re doing the writing explicitly.

Static Compile

Normally the compiler doesn’t reinvent the wheel for every tiny detail your program could possibly need. If there are normal things used by pretty much all normal programs, the compiler just dynamically links your code to a standard library object file that shares duty with other programs. This means that you’ll need to have those files available to run your executable. Normally this isn’t a problem because they’re always there or your system probably wouldn’t work.

But sometimes you need to send an executable into a situation where the libraries you use every day may not be available. Maybe it’s an old system. Maybe it’s a weird distribution. Maybe just the version numbers are messing with you. Or even just the paths are different.

To have the compiler create an executable with all the CPU op codes necessary for your software to work in isolation, you’ll need the -static option. Don’t confuse this with the -s option, which I believe is for stripping out strings and symbols. The -s option makes your code smaller and slightly harder to debug; the -static option makes your code much bigger and you should probably not be doing active development with it.

Debugging

Print Error Messages

Something like this:

fprintf(stderr,"Prints to standard error.\n");

Also with #include <stdio.h> assumed, you can also use this.

perror("File not found.\n");

Core Dump Analysis

What if you get the dreaded Segmentation fault? This means something bad happened at run time. Most errors are caught at compile time but sometimes your program looks fine to the compiler and does a silly thing once you actually fire it up. Besides mystical intuition the best methodical way to analyze the problem is to have the system create a memory dump at the time of the error and then use a special tool to look through this memory file to figure out what went wrong. To get a misbehaving program to create a core dump file compile like this:

gcc -g -o sketchy sketchy.c

Or if you’re definitely going to use gdb:

gcc -ggdb -o sketchy sketchy.c

If it still has a seg fault and you’re not getting a (core dumped) message appended to it, try changing your environment with:

ulimit -c unlimited

This removes any restriction on the size of core files allowed by the shell.

Note
When you’re done playing with core files, you might want to do ulimit -c 0 so that segmentation faults don’t generally produce core files. Normally, it’s a pain to have these files mysteriously lying around every time something crashes.

gdb

Assuming you have a core dump called core, run gdb like this:

gdb sketchy core

The core should load and allow you to investigate it. It might just tell you about the error and where it occurred.

Or if you don’t need a core dump, you can just run gdb sketchy and type run to run the program and see if your error happens in a more interesting and verbose way. Here are some of the important commands to be aware of when using gdb.

Table 1. Useful gdb Commands

<enter>

previous command

help

very sensible help

run

continuous run - can be followed by args (see set args)

start

start execution but in single step (stop at main), args ok

step

proceed execution to next source code line

next

like step but consider all subroutine lines as one

finish

execute until stack frame returns (stop at end of current function generally)

print <var>

prints the current value of the specified variable

set args <arg1..argN>

what is passed to programs started with run command

show args

query what arg list was set

bt

backtrace (or a nested list of function calls) good for finding where your program seg faulted

break <n>

set break point a source code line number n

cont

(also c) continue from stop at break point

shell

run a shell sub process using sh

layout next

set TUI split screen display to track through registers, assembly, or source

refresh

Refresh TUI screen (think Ctl-L)

skip function <name>

Skip named function in stepping, current if none given.

until <line>

Run until specified line number.

quit

leave gdb

Valgrind

According to the man page, Valgrind "is a flexible program for debugging and profiling Linux executables. It consists of a core, which provides a synthetic CPU in software, and a series of debugging and profiling tools." Practically, it can be very handy in troubleshooting insidious memory errors.

On Debian you can simply apt install valgrind. To use it, just run your executable through it. So my code and argument ./cnow cnow.cno becomes this.

valgrind ./cnow cnow.cno

This should show a bunch of helpful hints prefixed with the PID.

==3732== Memcheck, a memory error detector
==3732== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3732== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==3732== Command: ./cnow classic_c_cnow/cnow.cno
==3732==
...
==3732==
==3732== HEAP SUMMARY:
==3732==     in use at exit: 0 bytes in 0 blocks
==3732==   total heap usage: 124 allocs, 124 frees, 87,616 bytes allocated
==3732==
==3732== All heap blocks were freed -- no leaks are possible
==3732==
==3732== For counts of detected and suppressed errors, rerun with: -v
==3732== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

As you can see here, this code is not doing daft things with memory. Yay! Compare with this code.

==3903== HEAP SUMMARY:
==3903==     in use at exit: 6,091 bytes in 1 blocks
==3903==   total heap usage: 83 allocs, 82 frees, 325,847 bytes allocated
==3903==
==3903== LEAK SUMMARY:
==3903==    definitely lost: 6,091 bytes in 1 blocks
==3903==    indirectly lost: 0 bytes in 0 blocks
==3903==      possibly lost: 0 bytes in 0 blocks
==3903==    still reachable: 0 bytes in 0 blocks
==3903==         suppressed: 0 bytes in 0 blocks
==3903== Rerun with --leak-check=full to see details of leaked memory
==3903==
==3903== For counts of detected and suppressed errors, rerun with: -v
==3903== ERROR SUMMARY: 51 errors from 5 contexts (suppressed: 0 from 0)

It is good form to make sure all your code runs without Valgrind leak errors. If problems are detected, try it with valgrind -v or valgrind --leak-check=full for a deeper analysis.

For more details, see the full documentation for Valgrind.

Keywords

These words are all reserved for C. Don’t name things with the same name:

auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, int, long, register, return, short, signed, sizeof, static, struct, switch, typedef, union, unsigned, void, volatile, while

The fact that this list is so amazingly short is the good news in C! Enjoy!