The documentation for OpenCV is described as some of the best I’ve ever seen, but what really exists are hundreds of (good) unanswered questions on their forums, a reference that’s about like reading source code, seriously uncommented source code examples, and tutorials that don’t actually exist yet.

However, the sample programs are functional and quite illustrative. It is probably best to start with them and find the demo that is closest to what you need and then copy its major components.


Acquiring OpenCV Source

Normal Debian people doing the normal thing.

sudo apt install libopencv-dev python-opencv opencv-doc

Note that opencv-doc includes the Python examples (/usr/share/doc/opencv-doc).

OpenCV now uses GitHub.

git clone

Formerly they used CVS.

cvs login
cvs -z3 co -P opencv

Structural Overview


Basic structures and algorithms, XML, drawing functions


Image processing and vision algorithms


Machine learning, statistical classifiers, clustering


GUI, image and video IO


Misc extensions and fancy functionality that is not well documented.



Note also that many structures and types are named things like IplImage. This cryptic name refers to the Intel Image Processing Library.


Region of Interest, not return on investment.


Channel of Interest


  • core

    • #include "opencv2/core/core_c.h" - Old C version.

    • #include "opencv2/core/core.hpp" - New C++ version.

  • imgproc

    • #include "opencv2/imgproc/imgproc_c.h" - Old C version.

    • #include "opencv2/imgproc/imgproc.hpp" - Newer C++ version.

  • highgui

    • #include "opencv2/highgui/highgui_c.h" - Old C version.

    • #include "opencv2/highgui/highgui.hpp" - Newer C++ version.

  • calib3d - Calibration.

    • #include "opencv2/calib3d/calib3d.hpp"

  • features2d - Feature tracking.

    • #include "opencv2/features2d/features2d.hpp"

  • objdetect - HOG,SVM

    • #include "opencv2/objdetect/objdetect.hpp"

  • ml

    • #include "opencv2/ml/ml.hpp"

  • flann - fas library approximate nearest neighbors

    • #include "opencv2/flann/miniflann.hpp"

  • video - tracking, segmentation

    • #include "opencv2/video/video.hpp"

  • photo - new module for computational photography.

    • #include "opencv2/video/photo.hpp"

  • contrib - Possibly non-free.

    • #include "opencv2/contrib/contrib.hpp"

  • imgcodecs

  • videoio

  • gpu - now cuda* modules.

  • stitching - new

  • nonfree - aka xfeatures2d

  • legacy - not part of v3

  • ocl - OpenCL, maybe deprecated.

Getting Something To Compile

Getting the minimum functionality to compile and run using the OpenCV libraries shouldn’t be a huge pain, but like most C libs, it is. Once you know the secret code, it’s easy. Here’s the secret incantation for Linux.

First, I "installed" opencv. I did this by obtaining and unpacking opencv-2.4.9-SOURCE. Change into that directory and cmake <targetpath>. Ya, you might need to install cmake. Hopefully this will then produce a proper Makefile. Then you can just make to build everything. Then I did sudo make install as you’d expect and it put it in /usr/local/include/.

Ubuntu Woes

On Ubuntu 14.04 I got some run time error messages helpfully telling me to install libgtk2.0-dev and pkg-config (Ubuntu world) and that actually worked!

I’ve also seen advice for Ubuntu to have libv4l-dev installed so do that too.

Unfortunately it looks like Ubuntu 14.04 no longer "has" ffmpeg. I don’t know what kind of feud is going on there but it’s pretty annoying. If you look at the cmake output you can see what resources OpenCV thinks it has access to. For me V4L was ok after the above steps, but all the FFMPEG stuff was missing. I suspect that this may be why stuff like cap.get(CV_CAP_PROP_FPS) is not working.

You can try the Ubuntu OpenCV binary packages with apt-get install libopencv-dev. I’m not sure how that’s working out.

Simple Starting Line

I was able to get this to compile and run on Debian GNU/Linux 9 (stretch) with the system managed (binary package) install.

// Compile:
// g++ -lopencv_highgui -lopencv_core -o opencvtest
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/core/core.hpp>
#include <iostream>
int main( int argc, char** argv ) {
    cv::Mat image;
    image= cv::imread("sample.png", CV_LOAD_IMAGE_COLOR);
    if(!{std::cout<<"Error: Could not open image."<<std::endl; return 1;}
    cv::namedWindow("Window Name",cv::WINDOW_AUTOSIZE);
    cv::imshow("Window Name",image);
    return 0;


Then to compile your own programs using this, create a Makefile in your working directory that looks like this.

# Compiles OpenCV C++ programs.
CFLAGS = $(shell pkg-config --cflags opencv)
LIBS = $(shell pkg-config --libs opencv)
# Include this for debugging symbols.
FLAGS = -g

ALLCSRCS := $(wildcard *.c)
ALLCPPSRCS := $(wildcard *.cc)
all: $(ALLEXEC)
    $(CPP) $(CFLAGS) -o $@ $< $(LIBS)

    rm -f *.o $(ALLEXEC)
.PHONY: all clean


OpenCV can have a pretty confusing layout for the uninitiated. To find the one true description of the basic OpenCV data structures (usually literally a C struct), search for (find) a file with types in its name. Mine lived at ./opencv-2.4.9-SOURCE/modules/core/include/opencv2/core/types_c.h I think it used to be cxcore. This contains explicit definitions for things like:

CvArr, Cv32suf, Cv64suf, CvRNG, CvMat, CvMatND, CvSparseMat, CvSparseNode, CvSparseMatIterator, CvHistType, CvHistogram, CvRect, CvTermCriteria, CvPoint, CvPoint2D32f, CvPoint3D32f, CvPoint2D64f, CvPoint3D64f, CvSize, CvSize2D32f, CvBox2D, CvLineIterator, CvSlice, CvScalar, CvMemBlock, CvMemStorage, CvMemStoragePos, CvSeqBlock, CvSeq, CvSetElem, CvSet, CvGraphEdge, CvGraphVtx, CvGraphVtx2D, CvGraph, CvChain, CvContour, CvPoint2DSeq, CvSeqWriter, CvSeqReader, CvFileStorage, CvAttrList, CvString, CvStringHashNode, CvGenericHash, CvFileNodeHash, CvFileNode, CvIsInstanceFunc, CvReleaseFunc, CvReadFunc, CvWriteFunc, CvCloneFunc, CvTypeInfo, CvPluginFuncInfo, CvModuleInfo

Important structures are CvArr, CvMat, and IplImage. Though it’s implemented in C, the relationship of these is like C++ inheritance. In this case CvArr is used in CvMat which in turn is used in IplImage. This means that when CvArr* appears in function prototypes, it is ok to use CvMat* or IplImage*.


Create and Free Matrices


Normal matrix creation.


Just the header (size and type definitions primarily).


Allocate the data for the matrix.


Makes a copy of an existing one.


Cleans up a matrix.


Initialize headers on existing CvMat structures.


Like cvInitMatHeader() but initializes CvMat structure.

Query Matrix Properties


Returns an integer equal to something like CV_8UC1.


How many dimensions (2 for normal images)?


How big is the data in a particular dimension?


Returns a pointer to the item at the specified position. Can also be 1D and 3D and ND.


Returns a double value from a position in a matrix. There is also a cvSetReal2D() function.


Returns a CvScalar value from a position in a matrix. There is also a cvSet2D() function.

The best performing way to deal with matrices is to just use pointer arithmetic. The matrices are stacked so that X is filled first. Y is incremented when the first row of X is done.

Summing all elements in N=3 matrix
float sum( const CvMat* mat ){
    float s= 0.0f;
    for (int row=0; row<mat->rows; row++) {
        const float* ptr= (const float*)(mat->data.ptr+row*mat->step);
        for (col=0; col<mat->cols; col++) {


Components of an IplImage struct
typedef struct _IplImage {
  int                  nSize;
  int                  ID;
  int nChannels; /* 1, 2, 3, or 4 */
  int                  alphaChannel;
  int  depth; /* IPL_DEPTH_${X}, X= 8U, 8S, 16S, 32S, 32F, 64F */
  char                 colorModel[4];
  char                 channelSeq[4];
  int dataOrder; /* IPL_DATA_ORDER_${X}, X= PIXEL or PLANE */
  int origin; /* IPL_ORIGIN_TL or IPL_ORIGIN_BL, ie. top/bot left */
  int                  align;
  int                  width;
  int                  height;
  struct _IplROI*      roi; /* Used to limit functions to sub area. */
  struct _IplImage*    maskROI;
  void*                imageId;
  struct _IplTileInfo* tileInfo;
  int                  imageSize;
  char*                imageData;
  int widthStep; /* bytes until same column next row */
  int                  BorderMode[4];
  int                  BorderConst[4];
  char*                imageDataOrigin;
} IplImage;

It is often very effective to use ROI to isolate things to eliminate any extraneous operations on regions of a lack of interest. Here’s an example of using ImageROI to increment all of the pixels of a region. This will make a specified rectangle brighter and more white.

Increment Pixels in ROI
// roi_add <image> <x> <y> <width> <height> <add>
#include <cv.h>
#include <highgui.h>

int main(int argc, char** argv) {
    IplImage* src;
    if (argc == 7 && ((src=cvLoadImage(argv[1],1)) != 0)) {
        int x= atoi(argv[2]);
        int y= atoi(argv[3]);
        int width= atoi(argv[4]);
        int height= atoi(argv[5]);
        int add= atoi(argv[6]);
        cvSetImageROI(src, cvRect(x,y,width,height));
        cvAddS(src, cvScalar(add),src);
        cvResetImageROI(src); // Do this or only ROI is *shown* also.
    return 0;

Another way to do this kind of thing is to use the widthStep property to map out a subregion (a hand crafted ROI of sorts). Sometimes doing this can be more efficient than using the ROI functions.

General Matrix/Array/Image Functions

Need to do some operation on an array? Here are some of the possible functions available:

cvAbs, cvAbsDiff, cvAbsDiffS, cvAdd, cvAddS, cvAddWeighted, cvAvg, cvAvgSdv, cvCalcCovarMatrix, cvCmp, cvCmpS, cvConvertScale, cvConvertScaleAbs, cvCopy, cvCountNonZero, cvCrossProduct, cvCvtColor, cvDet, cvDiv, cvDotProduct, cvEigenVV, cvFlip, cvGEMM, cvGetCol, cvGetCols, cvGetDiag, cvGetDims, cvGetDimSize, cvGetRow, cvGetRows, cvGetSize, cvGetSubRect, cvInRange, cvInRangeS, cvInvert, cvMahalanobis, cvMax, cvMaxS, cvMerge, cvMin, cvMinS, cvMinMaxLoc, cvMul, cvNot, cvNorm, cvNormalize, cvOr, cvOrS, cvReduce, cvRepeat, cvSet, cvSetZero, cvSetIdentity, cvSolve, cvSplit, cvSub, cvSubS, cvSubRS, cvSum, cvSVD, cvSVBkSb, cvTrace, cvTranspose, cvXor, cvXorS, cvZero

For details on these, check the official reference.

Interestingly the ORA book has a recap of this table but they title it "Matrix and Image Operators". This may be a hint that if you see a function designed for an "array", it may really be more broadly applicable. Basically wherever you see CvArr*, you can use an IplImage*. Another good example is cvGEMM which is Generalized Matrix Multiplication.

Memory Storage Entities

When OpenCV needs to dynamically allocate memory it has an internal mechanism blandly called "memory storage" to facilitate this. Memory storages are linked lists of continuous memory blocks suited to efficient allocation and release. The functions used to create/destroy these entities are cvCreateMemStorage, cvReleaseMemStorage, cvClearMemStorage, and cvMemStorageAlloc. The default size is 64kB if not otherwise specified. The last function listed is a way to allocate the memory yourself and then assign that specific location to the memory storage object. Apparently explicit releasing of these things is essential if you really want to be comprehensive about clean up; other clean up functions don’t actually give the memory back to the system but merely make it ready for more of the same use.

On of the things that can be stored in a "memory storage" is a "sequence". This can be thought of as a deque in STL (but since OpenCV stubbornly does not like C++ they needed to do this internally - not that there’s anything wrong with that). Beyond this structure, sequences have pointers that can be used to assemble them into trees, lists, and other wacky structures.

typedef struct CvSeq {
  int       flags;              // miscellaneous flags
  int       header_size;        // size of sequence header
  CvSeq*    h_prev;             // previous sequence
  CvSeq*    h_next;             // next sequence
  CvSeq*    v_prev;             // 2nd previous sequence
  CvSeq*    v_next              // 2nd next sequence
  int       total;              // total number of elements
  int       elem_size;          // size of sequence element in byte
  char*     block_max;          // maximal bound of the last block
  char*     ptr;                // current write pointer
  int       delta_elems;        // how many elements allocated
                                // when the sequence grows
  CvMem Storage* storage;        // where the sequence is stored
  CvSeqBlock* free_blocks;      // free blocks list
  CvSeqBlock* first;            // pointer to the first sequence block

To create a squence entity use cvCreateSeq. To clear a sequence use cvClearSeq but remember, to really free up the memory involved, you need to revisit cvClearMemStorage. To access an arbitrarily located item in a squence use cvGetSeqElem. Or if you just need to know where in a sequence an item is, cvSeqElemIdx (which is a somewhat inefficient thing to do). Sequences can also be copied whole or in slices with cvCloneSeq and cvSeqSlice (the former is a subset wrapper of the latter). Slices can be used to remove or insert elements with cvSeqRemoveSlice and cvSeqInsertSlice. Another way to do this an element at a time is with cvSeqInsert and cvSeqRemove. Performance on these random accesses to the middle may not be sufficient. There is also cvSeqSort the sequence. You get to provide the comparison function (type *CvCmpFunc). Reverse the sequence with cvSeqInvert.

Because they are really linked lists, it is easy to treat them as stack structures. The following functions are available to use sequences as stacks conveniently.

  • cvSeqPush

  • cvSeqPushFront

  • cvSeqPop

  • cvSeqPopFront

  • cvSeqPushMulti

  • cvSeqPopMulti

The function cvSetSeqBlockSize is a bit like inode tuning in that it is useful to set the memory block size that gets allocated when new sequence items are needed. This allows you to accommodate huge items in short lists or small items in long lists. The default is 1kB.

There are ways to convert a sequence to an array, namely the cvCvtSeqToArray function. To go the other way, check out cvMakeSeqHeaderForArray.

There are some fancy reading and writing functions to load and read sequences in bulk efficiently. The downside is that these must be initialized and then closed to do proper housekeeping. The functions involved are cvStartWriteSeq, cvStartAppendToSeq, cvEndWriteSeq, cvFlushSeqWriter, CV_WRITE_SEQ_ELEM for writing and cvStartReadSeq, cvGetSeqReaderPos, cvSetSeqReaderPos, CV_NEXT_SEQ_ELEM, CV_PREV_SEQ_ELEM, CV_READ_SEQ_ELEM, and CV_REV_READ_SEQ_ELEM for reading.

Things to keep in mind about scales
  • matplotlib read .png 0 to 1

  • cv2.imread() .png 0 to 255

  • matplotlib read .jpg 0 to 255

  • cv2.imread() .jpg 0 to 255

  • cv2.cvtColor(image_0_to1) ⇒ image_0_to_255


void  cvLine(
  CvArr*   array,
  CvPoint  pt1,
  CvPoint  pt2,
  CvScalar color,
  int      thickness    = 1,
  int      connectivity = 8

The array is usually an image pointer (IplImage). The function cvRectangle is very similar to cvLine and does the obvious.

void  cvCircle (
  CvArr*   array,
  CvPoint  center,
  int      radius,
  CvScalar color,
  int      thickness    = 1,
  int      connectivity = 8

cvEllipse is pretty similar too. It can use bounding boxes or fancier input.

void cvFillPoly(
  CvArr*    img,
  CvPoint** pts,
  int*      npts,
  int       contours,
  CvScalar  color,
  int       line_type = 8

This draws a filled polygon. A similar function is cvFillConvexPoly which does only one polygon at a time and is much faster; it must also, as the name implies be convex. If the "polygon" isn’t (closed), the cvPolyLine is much faster yet.

There is also cvPutText which can be used to write text on the image. Apparently this might need to be used in conjunction with the eponymous cvInitFont.


The objects and data structures can be serialized for transfer to other systems and saving to disk. This involves the cvSaveImage and cvLoadImage functions for images and cvSave and cvLoad for matrices. There’s also the CvFileStorage structure which can be used with the cvOpenFileStorage and cvReleaseFileStorage functions. There are many other functions involved in serialization.

cvStartWriteStruct, cvEndWriteStruct, cvWriteInt, cvWriteReal, cvWriteString, cvWriteComment, cvWrite, cvWriteRawData, cvWriteFileNode, cvGetRootFileNode, cvGetFileNodeByName, cvGetHashedKey, cvGetFileNode, cvGetFileNodeName, cvReadInt, cvReadIntByName, cvReadReal, cvReadRealByName, cvReadString, cvReadStringByName, cvRead, cvReadByName, cvReadRawData, cvStartReadRawData, cvReadRawDataSlice


HighGUI is a collection of tools to facilitate high-level interaction with the OS and windowing system (e.g. X11). Importantly, this also coordinates image and video streams from camera devices. It is also heavily used for loading from and saving images to a file system.

Open A Window

The primary GUI function is cvNamedWindow which opens a window and puts its name in the title bar. The title/name is used as a handle in subsequent references to the window.

int cvNamedWindow( const char* name, int flags = CV_WINDOW_AUTOSIZE);

It’s inverse is cvDestroyWindow. This also takes the human readable name you gave it. Windows can also be referenced by a void* window_handle, so if that shows up, don’t freak out.

If you’re in a bigger hurry to destroy a lot of junk, use cvDestroyAllWindows().

File Features


Here’s the all OpenCV way.

img= cv2.imread('myimage.png')

Or use matplotlib which does fun things to the red and blue channels (inverts them).

from matplotlib import pyplot as plt
img= plt.imread(sys.argv[1])
plt.imshow(img, cmap='gray', interpolation='bicubic')


Here are two very important C functions are for getting images to and from disk.

IplImage* cvLoadImage( const char* filename, int iscolor);
int cvSaveImage( const char* filename, const CvArr* image);

Where there are all kinds of ways to control color depth, the default of iscolor is CV_LOAD_IMAGE_COLOR.

To load video into a program use one of the following.

CvCapture* cvCreateFileCapture( const char* filename );
CvCapture* cvCreateCameraCapture( int index );

Obviously the latter is for cameras. If the CvCapture pointer is NULL then something happened to prevent loading. This should probably be checked. When dealing with files, you’ll need to make sure the correct codecs are supported by system libraries. Cameras don’t have this problem. Normally the index is set to 0 which will find the first camera, but you can use tricks here to force it to use V4L or FIREWIRE, etc. Another trick is to feed it -1 which, I’m told, will open a selection dialog and allow the user to chose the camera.

It is also possible to set properties of the capture device like frames per second, codec, starting frame number, width, height, etc. This is (optionally) done with this function.

int cvSetCaptureProperty( CvCapture* capture, int property_id, double value);

Query properties with this one.

double cvGetCaptureProperty( CvCapture* capture, int property_id);

To read video frames you could use the following functions.

int        cvGrabFrame( CvCapture* capture );
IplImage*  cvRetrieveFrame( CvCapture* capture );

These go together. The cvGrabFrame just pulls in (and indexes setting up the next) the frame in a very efficient way to get it into memory. However, you can’t access it or work with it. That’s what cvRetrieveFrame accomplishes. It will copy the frame out of the grab buffer and into a proper IplImage structure where it can be worked with. There is also a way to do both of those operations in one shot using this function.

IplImage*  cvQueryFrame( CvCapture* capture );

Frames of video can also be written to disk (or some kind of output). To do this use cvCreateVideoWriter and then cvWriteFrame.

Once you’re done with a capture device, clean it up with this.

void cvReleaseCapture( CvCapture** capture );
void cvReleaseVideoWriter( CvVideoWriter** writer);


One important function that doesn’t seem to fit in the HighGUI library (but that’s where it’s found!) is cvConvertImage. This can do some color depth conversions and grayscale stuff. It can also flip images to reverse images (like loading an old slide backwards).

Here’s how to convert to gray scale which is a very common precursor to many operations.

grayimg= cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
fixrgb= cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
hls= cv2.cvtColor(img,cv2.COLOR_RGB2HLS)
R = image[:,:,0]
G = image[:,:,1]
B = image[:,:,2]
H = hls_img[:,:,0]
L = hls_img[:,:,1]
S = hls_img[:,:,2]

If you read in an image using cv2.imread() you will get an BGR image, but if you read it in using matplotlib.image.imread() this will give you a RGB image.





Show Loaded Image In Opened Window

void cvShowImage(const char* name, const CvArr* image);

See the examples for the complete process in action.

Waiting For Key Press

while( 1 ) { if( cvWaitKey(100)==27 ) break; }

If cvWaitKey is sent a 0, then it will wait indefinitely instead of the specified number of ms.

Mouse features are supported but are done in the ordinary way with a callback.


In OpenCV, sliders are called "trackbars".


Since HighGUI does not provide any kind of buttons it is common for trackbars that only have 2 positions.

Writing Camera Feed To A File

Here is a complete example that worked with my PS Eye camera to record video capture to a file.
/* A simple demonstration of activating the camera and recording
 * it's input to an mpeg file. Works with my PS3Eye! */

#include "highgui.h"
int main(int argc, char** argv){
    CvCapture* capture;
    const char* title;
    if (argc==1) {
        capture= cvCreateCameraCapture(0);
        title= "Camera 0";
    } else {
        capture= cvCreateFileCapture(argv[1]);
        title= argv[1];
    cvNamedWindow(title, CV_WINDOW_AUTOSIZE);
    int isColor= 1; //0 is grey
    int fps= 30,camW=640,camH=480;
    CvVideoWriter *writer= cvCreateVideoWriter(
    IplImage* frame;
    while (1){
        frame= cvQueryFrame( capture );
        if (!frame) break;
        cvShowImage( title, frame);
        cvWriteFrame(writer, frame);
        char c = cvWaitKey(33);
        if (c == 27) break;

Image Processing

Once control over the images has been achieved OpenCV provides a lot of ways to manipulate those images.


OpenCV has five types of image smoothing or blurring using the cvSmooth() function.

  • CV_BLUR - Simple blur

  • CV_BLUR_NOS_SCALE - Same but with no scaling

  • CV_MEDIAN - Median blur

  • CV_GAUSSIAN - Gaussian blur

  • CV_BILATERAL - Bilateral filter


In OpenCV morphology refers to manipulations on an image based on some reference thing that reminds me of a "brush" in paint programs. There is dilation where the brush increases bright regions and there is erosion where the brush decreases them. The "brush" is called a kernel and can take square or elliptical form or some arbitrary user defined shape. There are all kinds of subtle variants. Look into the cvMorphologyEx function for more details.

Flood Fill

Using the cvFloodFill() function parts of an image can be filled in as with normal image manipulation programs. A mask can also be supplied to restrict where the operation must spend resources and take effect. A seed point is provided as a place to start from. The function also takes a "lo" (their name) differential and an "up" differential. If the neighboring pixel is between the two it is colored. The pixels can all be compared to the seed or to neighbors. The result of what got filled can be returned as the mask.


The cvResize() function does what the name implies. Choose one of these interpolation functions.

  • CV_INTER_NN - Nearest neighbors

  • CV_INTER_LINEAR - Bilinear

  • CV_INTER_AREA - Pixel area re-sampling

  • CV_INTER_CUBIC - Bicubic interpolation


This is a complex topic to be sure. For example, I keep running across the word "convolution" and looking it up I find, appropriately, "something that is very complicated and difficult to understand". But it also means "a twist or a curve" (as on the cerebrum of fancy mammals). None of those definitions are helpful yet. The idea of image pyramids is that a stack of derived images is created with decreasing resolution. This allows for operations to be performed at the top (cheap) level as a preliminary optimization and then extended to a focused region in the high resolution bottom level. Look into cvPyrSegmentation for details.

Some more about convolution. It seems to involve something that is done to all parts of an image. What that something specifically entails is defined by the "convolution kernel". The convolution kernel is just a grid of values with the "anchor point" located in the middle of the grid. This grid is superimposed, successively, over every point in the input image. The grid is aligned so that the kernel is on the iterated input image point. Then the values in the kernel are multiplied by the places on the input image they superimpose, those are added and that is set as the new value for the output image. Of course this implies that there is some trickery required at the edges. In other words if a 5x5 kernel was at (100,23) of a 100x100 the middle row of the kernel grid would be at (98,23), (99,23), (100,23), (101,23), and (102,23), but there is no 101 or 102. The details of how this gets resolved are found in the borderInterpolate function.


The cvThreshold() function takes a source and destination array (image, matrix, whatever) and checks the source against a comparison function and sets the corresponding output pixel accordingly.

Here are the types of threshold behaviors and the function that the destination pixel is set to.

  • CV_THRESH_BINARY - (s>t)?M:0

  • CV_THRESH_BINARY_INV - (s>t)?0:M

  • CV_THRESH_TRUNC - (s>t)?M:s

  • CV_THRESH_TOZERO_INV - (s>t)?0:s

  • CV_THRESH_TOZERO - (s>t)?s:0

Here s is each source pixel. M is the maximum value which seems to basically be just some number you’d like to use; you get to set it. t is the threshold value which you also get to set of course.

And if that is too easy for you, go crazy with cvAdaptiveThreshold() which can dynamically adjust t to be more in line with the local surroundings of s.

Pseudo Derivative Filtering

The Sobel derivative is a type of convolution that calculates the derivative (change in value over distance - but not really because it’s not continuous) of the image. This often has a directionality (e.g. change in X only). The point of this is to detect edges and other "features" in images. This is done with convolution and a kernel to reduce sensitivity to noise. If you want to use a 3x3 convolution kernel in your Sobel filtering, it is recommended to use CV_SCHARR which is a specific optimized value for that purpose.

sobelx= cv2.Sobel(grayimg, cv2.CV_64F,1,0)
sobely= cv2.Sobel(grayimg, cv2.CV_64F,0,1)
abs_sobelx= np.absolute(sobelx)
scaled_sobel= np.uint8(255*abs_sobelx/np.max(abs_sobelx))

Another similar function is cvLaplace(). The mathematical Laplace operator is a sum of second derivatives along the x and y axes. Again it’s not really a real derivative which indulges in the idea of an analogue universe. It’s the same kind of thing as the Sobel, but with slightly different results. Interestingly it can be combined with the Sobel filter for even more pragmatic and effective results in many normal edge detection situation.

Expanding on the previous techniques is the Canny algorithm. This does all kinds of mathematical black magic with derivatives, combining X and Y directions, and arrives at a very high quality edge detection. The Canny algorithm uses a hysteresis threshold to map contours of value that outline detected features. The Canny algorithm only works on greyscale (returns a 1 bit image). Check out the section on "contours" to find out more about how edge detection is done in practice.

Fancier Feature Extraction

The first Hough transform was a way to discriminate patterns from images. The Hough transform makes it possible to perform groupings of edge points into object candidates by performing an explicit voting procedure over a set of parameterized image objects. At frist the parameters were slope and intercept but this caused some mathematical problems (vertical lines=division by zero); polar coordinates are now used. The important thing is that if you need to find meaningful lines in your somewhat noisy image, look into this. Also circles and ellipses are theoretically possible.

Remapping and Transform Functions

Often you’ll have an original image and you’ll want a modified version of it. The cvRemap can map an image on to another. The target image is just the target, it is not some other scene that merges in some complex way. A classic simple example would be scaling an image up or down. If scaling up, it is easy to imagine gaps being present in the destination image. This is why the algorithm works from each position of the target image and back figures what from the source must go there. This involves a lot of interpolation, the specific nature of which can be specified (CV_INTER_NN, CV_INTER_LINEAR, etc). The mapping can be complex. If so it is described by map images which specify how the source is transformed. There is one map for X and one for Y. A great practical example of this function is for correcting camera lens distortions into a more useful form. I could also imagine sorting out some Oculus Rift feed with this.

When mutating the geometric form of images, there are two kinds of transformations, affine and perspective. Affine transforms basically produce parallelograms (from the four edges of the original). The points can be mushed in any way as long as the top and bottom remain parallel and the left and right side remain parallel. Perspective transformations, on the other hand, introduce a zoom or vanishing point and can produce trapezoids. In both straight lines remain straight. In the first case, cvWarpAffine will mutate the image into the desired form. As with remap, there is some heavy interpolation and how that’s done can be specified in the familiar ways. The details of how the transformation should go is encoded into a 2x3 matrix since the source and destination, both of which are necessarily rectangles, do not indicate that. To figure out what this matrix is, OpenCV has a nice function to compute it, cvGetAffineTransform. This function takes a source and destination image each containing exactly 3 points. The final parameter is the matrix which the function sets. If instead of mushing your image into squished shapes you are rotating them, there is another way to calculate the affine transform matrix. The function cv2DRotationMatrix does basically the same job as the cvGetAffineTransform function but with a center, angle, and scale input. These two can be combined for an image that is both rotated, scaled, and warped.

The cvWarpAffine works for "dense" images, i.e. an entire x,y grid of points. If the input is just a series of points, i.e. not all points in an x,y area, then this "sparse" set could be more efficiently transformed using the cvTransform function.

Perspective transformations have similar functions. cvWarpPerspective is the main one to transform dense maps. cvGetPerspectiveTransform helps to figure out what the mapping matrix is. Sparse perspective transforms can use cvPerspectiveTransform.

An Example Of Affine Rotation And Perspective Warp
import cv2
import numpy as np
import matplotlib.pyplot as plt
img= cv2.imread('sign.png',cv2.IMREAD_COLOR)
img= cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
h,w= img.shape[:2]
# Rotation using warpAffine
rM= cv2.getRotationMatrix2D((w/2,h/2),10,1) # Center, degrees, scale.
rimg= cv2.warpAffine(img,rM,(w,h),
# Perspective using warpPerspective - morph from start to target.
warpS= np.array([[0,0],[100,100],[0,100],[100,0]],np.float32)
warpT= np.array([[18,0],[100,100],[0,100],[82,18]],np.float32)
pM= cv2.getPerspectiveTransform( warpS, warpT )
pimg= cv2.warpPerspective(img,pM,(w,h))
# Plot
prows,pcols,fig= 1,3,plt.figure()


A nonintuitive mapping function is cvCartToPolar and cvPolarToCart. Why one would want to convert an image from Cartesian coordinates to polar or the other way is a bit tricky. It is useful in catching edge detection thresholds after other filtering is performed. Another similar but weirder one is cvLogPolar. This transforms (x,y) into (log radius,angle). If done just right, so the theory goes, this provides a kind of invariance to planar rotation and scaling which might be useful if trying to track an object. If the object scales, its transform will shift on the horizontal. If the object rotates, the transform will shift on the vertical. This might be useful, albeit complicated, for tracking a fixed size moving object from a fixed overhead camera.

OpenCV has cvDFT which is Discrete Fourier Transform. This is a "fast" (FFT) O(N log N) version. Apparently, DFT is often overkill and for practical situations a better function is likely to be cvDCT or the Discrete Cosine Transform`.


If an image is too dark or washed out, the variation between all the pixels is not ideal in that it does not use the full range of information expressible by the pixel value range. Plotting the histogram of such an image will show most pixels concentrating in a narrow band of value range. By using the cumulative distribution function, the histogram can be remapped and the image revalued to make sure that the resultant histogram uses more of the range. OpenCV has the cvEqualizeHist function for this. While I can see this being useful to improve the aesthetics of a rendered image for humans, I wonder if it could really impart any more information to the image in a way that would allow processing algorithms to actually do a better job. In other words, what’s the difference between equalizing the histogram and just having some feature recognition filter concentrate only on a limited range of values?

But histograms have quite a few clever uses. Histograms of colors, edge gradients and other attributes can be used to determine scene specific content. Hand gesture recognition is one application. They can detect transitions in videos. Maybe a way to train a system to cut out ads. Hmm.

OpenCV has a lot of functions to make working with histograms as easy and trouble-free as possible. Look for the data type CVHistogram and the constructor/destructor functions cvCreateHist, cvSetHistBinRanges, cvClearHist, and cvMakeHistHeaderForArray. The last one uses data you have already organized to bless it as a histogram that OpenCV can use.

Another way to generate less arbitrary and more mission oriented histograms, is to automatically create them from images. The cvCalcHist function can take an image and make a typical useful histogram out of it involving a variety of properties.

Once the histogram is complete, accessing the data (or the pointer to the data) can be achieved with functions such as cvQueryHistValue_nD and cvGetHistValue_nD.

Also consider cvNormalizeHist which will replace values with the portion of the total events that goes in that bin. So if you have 25 events and in bin x there are 5, the normalized form will replace the 5 with .2; everything should add up to 100% or 1 (though this can be changed with the factor argument to the function). Another valid approach which is subtly different is to normalize the colors (or whatever) in your source data so that the histogram is ready to be used with no further conditioning.

Another function to process the histogram is cvThreshHist that basically resets bins with very few members. Imagine bins with counts of 1,3,2,0,1,78,26,132,19,2,0,0,3,1. It is likely that thinking of this data as 0,0,0,0,0,78,26,132,19,0,0,0,0,0 would be more useful.

The cvCopyHist function does the obvious but in several subtle flavors involving either filling a pre-existing same size target histogram with the source or creating (allocating) a target from nothing.

The cvGetMinMaxHistValue gets minimum and maximum values obviously, but how exactly isn’t entirely clear. I believe that it returns the number of items in the bin with the most items and optionally the index where that bin is. I don’t know how the latter part of that handles ties.

Histograms can be compared with the cvCompareHist function. There are several possible criteria, or "methods", which can be used.

  • Correlation. Perfect match = 1, total mismatch = -1, no correlation = 0.

  • Chi-square. Perfect match = 0, total mismatch = unbounded. Accuracy.

  • Intersection. Perfect match = 1, total mismatch = 0. Speed.

  • Bhattacharyya distance - a way to measure differences in statistical distributions which is sensitive to differences in mean and standard deviation. Perfect match = 0, total mismatch = 1. Accuracy.

There is another histogram comparing technique called Earth Mover’s Distance. This basically treats the items in bins as dirt that must be moved and takes into account how much needs to be moved and how far away to compare two probability distributions or histograms. OpenCV has cvCaclEMD2 which is full of parameters allowing you to do fancy things such as specify your own distance and work metrics.

Make sure to use cvNormalizeHist before comparing because comparing unnormalized histograms is usually meaningless.

There is a technique called "back projection" which can determine how well data fit the distribution of a histogram model. For example, if you have a histogram of an object you can see if an image contains regions with a similar histogram using back projection. The function to consider is cvCalcBackProject. A related function is cvCalcBackProjectPatch which will check if an image contains sub regions that are well matched.

A similar thing is template matching which does similar things to the histogram functionality but without histograms per se. This uses cvMatchTemplate to take patches of image, say a thumbnail sized image of just an apple, and scan a larger image looking for likely similar regions. There are all kinds of similarity metrics as is typical and they all have subtle functionality and performance tradeoffs.


Contours are ways to manage features of images. They are stored as CvSeq type sequences (linked-lists deep down). Contours can be created from images filtered by cvCanny or cvThreshold, etc using the cvFindContours function. This function can get tricky in the same way a bucket fill tool can get tricky with respect to finding islands on lakes on islands in lakes, etc (see 69.793° N, 108.241° W for an example). OpenCV calls these things "contours" (islands) and "holes" (lakes). This all apparently does often get quite complex and OpenCV has a fancy data structure called a "contour tree" to organize these complex relationships. In this structure, the world’s continents (sticking with the geography metaphor) would have contours at the top of a tree and all lakes would be children. And islands on those lakes would be (contour) children of their respective lakes, ad infinitum. Back in the world of image processing, the islands often have lakes that match them exactly (like an atoll) because that’s how edge detection filters. This means that edges tend to have inner and outer edges themselves just as an atoll has an outer beach and an inner beach, but the atoll separates the lagoon from the sea.

The cvFindContours function expects an 8-bit single-channel image which, it is important to note, will be mangled during calculation (make a copy if that’s needed). This function will allocate the CvSeq structures necessary (and free them) but it’s a little confusing how to set it up. The firstContour parameter takes a pointer to a pointer that would point toward the first contour if it existed. But it doesn’t because the function spawns it. But that pointed to pointer is where you’ll find the head of the tree structure that results from the function. The return value is the total number of contours found. The mode and method parameters respectively specify what sort of operation should be calculated and exactly how if there are variants.

There are four different modes which basically specify the topology of the resulting tree of contours.

  • CV_RETR_EXTERNAL - simplistic, there’s one contour, no linked structures.

  • CV_RETR_TREE - island1’s child is a list of lake01 and lake02; lake01’s children is a list of island010 and island011, etc.

  • CV_RETR_CCOMP - a list of just contours, holes are doubly linked to the contours (the holes are in their own lists and their heads are tacked to the contour node).

  • CV_RETR_LIST - Get’s all contours and puts them in single list using h_prev and h_next. This is by far the easiest to use and the default.

After figuring out how you want the contours to be organized internally, the next thing is to specify the technique used to compose the contours themselves. There are several and they’re pretty technical.



  • CV_CHAIN_APPROX_SIMPLE - The default. Maybe best to start here.



To get an idea of what these things are, this might be helpful. Described simply, chain code starts with coordinates of a boundary pixel and then is a stream (or chain) of directions one would need to travel to stay on the boundary. When the original point is arrived at, the region is defined. The specific case of Freeman chains is an 8 direction system with 0 at 12 o’clock going to 7 at 10:30. This encoding doesn’t have to be at the pixel level and can be quite rough making it a very efficient way to describe large areas of input images.

Besides cvFindContours, there is another way to do things. There is a cvStartFindContours which creates a "scanner" or a CvSequenceScanner object. You can iterate over the contours with cvFindNextContour. When finished, cvEndFindContour stops that process.

An important utility is to be able to draw contours which can be done with cvDrawContours. This allows all the normal stuff like line color and thickness as well as the levels of the tree to plot.

Once the contour has been found, you can use cvApproxPoly on it to convert it to a contour with fewer points. This is basically a raster to vector operation even though the result is still a contour sequence. The algorithm works by finding maximally distant points on the original contour. Those are the first two of the final points. Then the line between them is checked to see where it is farthest from the contour. That point on the contour is added to to new approximated contour. This continues until the desired number of points is reached.

A related function is cvFindDominantPoints which seems very similar to what I call "cull shallow angles" in to2d. It differs in that it can look at several points away from just its immediate neighbor. It is the same basic idea though i.e. to do what the function name suggests. The method is selectable, but there is only one choice CV_DOMINANT_IPAN. This IPAN stuff is stupidly named and should be treated as a random label for this technique.

Now that you have simple or complex contour sequences, there are some calculations you can do that can be informative. There is cvArcLength and cvContourPerimeter. The former can provide lengths of just portions of the contour (use slices). The cvContourArea is similar and can also do a portion of the contour or you can set slice = CV_WHOLE_SEQ.

Another useful thing to do with a contour sequence is to get an even rougher idea of where it is by using cvBoundingRect. This is parallel to the X and Y axes. If you wan the truly smallest rectangle regardless of orientation, check out cvMinAreaRect2 which returns a type CvBox2D (containing center x and y, size x and y, and angle).

In the same spirit as the bounding box functions, there is also cvMinEnclosingCircle. Related, but with a different approach is cvFitEllipse2. This does not ensure the resulting ellipse contains all points but rather does a kind of least squares fitting function.

After finding the contours and distilling them down to bounding boxes, you can check for collisions and the like. The questionably named cvMaxRect function takes 2 rectangles as input and returns a rect (actually a CvRect) which is the smallest rectangle that will enclose both.

Getting fancier is a function to calculate moments, cvContourMoments which returns into a special data type, CvMoments. It seem that function is actually a wrapper for the cvMoments function which can provide normalized moments too. This allows for comparing different sized objects in a consistent way. More specialty functions are cvGetCentralMoment, cvGetNormalizedCentralMoment, cvGetHuMoments (rotation invariant). More details about image moments. This kind of thing might be useful to build object recognition profiles, maybe something like OCR where each letter has a set of moments that identify it regardless of position, size or rotation. OpenCV has a function to compare shapes in this way (even calculating the moments on the way), cvMatchShapes.

Because this isn’t complex enough, there is another more detailed way to compare contours than the summary statistics like moments. Looking at the details of the contour paths themselves is done by constructing a data structure called CvContourTree which is not the same as the data structure that contains a (possibly linked list of a) set of contours. This is a tree that represents a single contour’s shape in a way that is more easily matched by hierarchical geometric features. This can all be thought of in black box terms as just another way to compare shapes based on geometry. The functions provided to do this are cvCreateContourTree, cvContourFromContourTree, and cvMatchContourTree.

Another way to summarize shapes for comparison/identification purposes is to calculate the convex hull and look for the differences, called the convexity defects. OpenCV has cvConvexHull2 and cvConvexityDefects to facilitate this. Also cvCheckContourConvexity can determine if a contour is already convex.

If hulls are not enough, there is yet another matching strategy called pairwise geometrical histograms. This basically takes the Freeman chain codes mentioned above and makes a histogram of the direction changes (a CCH, chain code histogram). See cvCalcPGH for the function to do this.

Motion Detection

It seems that a primary way to scan images or sequences of images is to look at rows of pixels at a time. There is a function cvInitLineIterator that sets this up and a macro CV_NEXT_LINE_POINT that moves from pixel to pixel in the line. I think it works something like this.

CvLineIterator iterator;
int iterator_size;
iterator_size= cvInitLineIterator(rawImage,pt1,pt2,&iterator,8,0);
for (int j=0; j<iterator_size; j++) { CV_NEXT_LINE_POINT(iterator); }

You can sample whole lines at a time saving yourself the point to point loop with cvSampleLine.

Frame Differencing

A simple way that objects can be detected in a scene is to subtract the pixels of a frame now from the values of pixels from a little while ago. The function cvAbsDiff does this helpfully dealing in absolute values so it doesn’t matter which way around you go. This function takes 2 input frames, one now, one before, and an output frame called "frameForeground". I’m not keen on the use of the word "foreground". Imagine pointing a camera out a window whose frame and curtains were in the shot. That window would be the foreground but the points of interest if something moved outside of the window are technically in the background. But just be aware that this is the terminology. When using the cvAbsDiff function, it’s usually sensible to cut off minor fluctuations which are generally noise and to set the rest to 255. Do this with cvThreshold.

There are much fancier ways that can help with things like blowing leaves on a tree in an outdoor scene. This wouldn’t want to be regarded as an object of interest in motion. One approach is to use averaging. OpenCV can develop a running average for the values of each pixel and when large deviations from that occur, do something special.

OpenCV has an accumulation function cvAcc that can help accumulate statistics about a series of pixel changes. This function basically adds up the value of the pixel which can be used with the total number of images to get the mean value. Another similar metric is to use cvRunningAvg. The cvSquareAcc can be useful in calculating the variance of pixels. Presumably this will be a metric of how wildly the values are differing which could be very relevant to motion detection. The cvMultiplyAcc is another one that can be used in such applications.

For complex moving backgrounds (windy trees), the ideal thing is to fit a distribution to the data that is present in the previous frames. Since this could imply using a lot of memory, a complex but efficient approach is to use the same kinds of tricks used by compression algorithms; check out codebooks. These focus on the important pixels more than the boring ones. The pro tip here is to use HSV or YUV and not RGB when doing fancy things like this.

Image Repair

A nice function is cvInpaint which can fill in small (thin really) details that have been messed up in an image. It reminds me of the clone tool in Gimp. So if you have a photo with some thin writing on it done in a paint program, for example, this function can really do a good job of blending it away. It seems ideal for camera artifacts and grainy footage.

Mean-Shift Segmentation

The function is cvPyrMeanShiftFiltering and uses the pyramid structures. To me the results of this remind me of "posterization". But the technical description is something like this.

Given a set of multidimensional data points whose dimensions are (x, y, blue, green, red), mean shift can find the highest density “clumps” of data in this space by scanning a window over the space.

Motion Tracking

Lots of heavy math packed into OpenCV for this.

  • Harris corner detection. And friends Shi and Tomasi.

  • SIFT is Scale Invariant Feature Transformation and is not included in OpenCV. Just noting it as a related topic.

  • Horn-Schunk Calculates a dense optical flow which is a map of all the displacements made by each pixel over time. It assumes smoothness in the flow over the whole image. This is a dense flow mapping and considered kind of inefficient for most cases. It’s OpenCV function is cvCalcOpticalFlowHS.

  • Block matching algorithms involve subdividing the image into smaller blocks and trying to find those blocks in subsequent images. If found, track the motion vector. Simple, but perhaps not especially efficient. OpenCV’s function for this is cvCalcOpticalFlowBM.

Feature Detection

OpenCV has a function called cvGoodFeaturesToTrack() which uses the Shi Tomasi algorithm, computes the second derivatives using Sobel operators, calculates the required eigenvectors, (whew!) and simply, from our point of view, returns a list of points that should be pretty good for tracking. This list contains points that you hope to be able to find again in another frame of the video. For example, an edge can be moving parallel to its orientation and the motion may be undetectable. A good feature, like a corner is noticeable whenever it moves in any direction. Highlighting a few especially amenable points allows for "sparse", but sensible, optical flow mapping.

If for some reason you require more precision in feature detection than the grid of pixels would imply, explore cvFindCornerSubPix. This will do some fancy dot product trickery to try and isolate corners even more precisely than the bitmap would seem to allow. This seems applicable in calibration operations.


Lucas-Kanade (also "LK") is a sparse flow mapping technique. This was used in the worm motility project. Requires consistent brightness, small motions frame to frame, and "spacial coherence" (which I do not exactly understand).

OpenCV deals with Lucas-Kanade implementations in two ways. There is the cvCalcOpticalFlowLK function which just calculates the flow field where it can be calculated (0 where it can not be). And there is pyramid based processing with cvCalcOpticalFlowPyrLK. Basically you need to supply the points you want to track in featuresA (from cvGoodFeaturesToTrack usually) and call the function. When it returns, check the status array to see which points were actually tracked successfully and then check featuresB to see where they are now.

Mean-Shift and CAMshift Tracking

Mean shift is a technique for locating the maxima of a density function. It seems to work by specifying a window enclosing some points, calculating the center of mass of the points, recentering the window there, and iterating until the window no longer needs to move. Of course choosing this window wisely is a tricky detail. OpenCV has cvMeanShift to facilitate this process.

CAM is "continuously adaptive mean-shift. This allows for the window to be resized as necessary to accommodate things like a subject getting nearer and farther away from the camera. Its function is cvCamShift.

The features that these algorithms track are usually colors, but they can really track the distribution of any kind of feature. For example, my first thought was to use feature detection and track the distribution of "pointy bits" or straight edges. That may or may not work but it is theoretically possible with this framework.

Motion Templates

This technique to track motion relies on an initial silhouette being specified. This could be done with chroma key or difference calculations with a stationary camera or some other obvious technique. You can also try some fancy technique like segmentation. A motion history image (mhi) is created by setting the value of the output to the current time stamp. Subsequent frames continue this and the older images leave a fading trail of previous location ghosts. The function cvUpdateMotionHistory helps with this.

By taking the gradient of this motion history map, perhaps by using Scharr or Sobel techniques, the motion vectors can be ascertained. OpenCV has cvCalcMotionGradient to get the gradients and cvCalcGlobalOrientation to find the overall vector of motion, i.e. the sum of the gradient vectors. Thinking of things like tracking a single rotation invariant object like a billiard ball from a stationary overhead camera, this is enough. But if you want to track many balls, you’ll need to segment the motion profile with something like cvSegmentMotion. This kind of complex technology could be useful for gesture recognition, for example.

Estimators and Kalman Filtering

A Kalman filter is a mathematical process that forms ever better predictive models based on continual (though possibly discrete) input updates. It can be thought of as a type of sensor fusion as it can handle multiple indicators about a state. For example, GPS and odometry can be used as inputs and the more stable of these will contribute more (or something like that). In motion tracking this can take many input indicators of motion like the motion vectors calculated as above and integrate them to form a more stable and correct impression of the actual motion of the entire object of interest (not just its corners in isolation).

OpenCV provides a CvKalman data structure. It is created and released with cvCreateKalman and cvReleaseKalman. The iterative process of the Kalman cycle is executed with the functions cvKalmanPredict and cvKalmanCorrect (those two functions provide the best two word description of this complex technique).

The Kalman filter assumes that the uncertainty in the feedback is Gaussian. That need not be the case. If there is a known bias, the probability distribution can be represented as a density map (more dense "particles" represent greater likelihood). This map with an arbitrarily complex probability profile can be given to the the cvCreateConDensation function which works much like the Kalman functions.There is a CvConDensation struct like the CvKalman. The tricky bit to using this is that this confidence map of particles needs to be continuously updated to reflect known conditions. There is no automatic way to do this.

Camera Calibration

I did extensive analysis into best practices for calibrating camera distortion. See my full post on the topic for details and a calibration program which exhaustively searches calibration images for detected points. The general strategy follows.

Cameras are not perfect and dealing sensibly with distortions caused by lens design and defects is critical to converting pixels received into meaningful knowledge about the real world’s geometry. The two types of systemic distortions that OpenCV helps with are radial and tangential. Radial distortions occur because the light deflection near the center of the lens is different than near the perimeter. Tangential distortion is a property of the image plane onto which the lens projects being not exactly aligned with the lens' proper axis.

OpenCV uses a "camera intrinsics matrix" and a "distortion vector" to make adjustments. Figuring these out seems the tricky bit since once they are established OpenCV can make corrections. The cvCalibrateCamera2 helps figure out the correction data. It looks at a reference specimen of known points from multiple views. It also rotates to cross check. The reference specimen is usually a chessboard patter and cvFindChessboardCorners can help with that kind of target. It seems that after using that function to find the chessboard points, it’s good to further refine the model with cvFindCornerSubPix. To check you’ve got the right thing, there is cvDrawChessboardCorners.

Once the chessboard is seen properly, a planar homography matrix is needed that can convert between images and real locations which may be in the shot at an angle or rotated. This can be done with cvFindHomography. Once this is done actual camera calibration (finding the distortions of the camera itself) are possible. This is when cvCalibrateCamera2 is called.

Once you have the camera intrinsics and you just need to compute the location of the objects in the scene. The cvFindExtrinsicCameraParams2 function can take the intrinsic matrix and the distortion coefficients that you have previously computed and return a rotation and translation vector. Again this is all looking at a known chessboard. So basically, you can use multiple views of a chessboard to figure out what distortions are present in the optics of the camera and from then on, you can look at a known fixed chessboard in a scene and figure out where your camera is in space.

Once you have the the intrinsic matrix and distortion coefficients, you probably want to correct the images. The functions cvInitUndistortMap, cvUndistort2, and cvUndistortPoints all take this data and rework an image to correct for the optical errors.

Simple Illustrative Example Programs

This test program should open the hard coded image and display it in a box.
#include <cv.h>
#include <highgui.h>

using namespace cv;
using namespace std;

int main(){
    Mat image;
    // Read the file
    image = imread("monkey.jpg", CV_LOAD_IMAGE_UNCHANGED);
    // Check for invalid input
    if(! ){
        cout <<  "Could not open or find the image" << endl;
        return -1;
    // Create a window for display.
    namedWindow( "Display window", CV_WINDOW_AUTOSIZE );
    // Show our image inside it.
    imshow( "Display window", image );
    return 0;

Here’s another similar minimal test doing things slightly differently.
#include "highgui.h"
int main( int argc, char** argv) {
    IplImage* img= cvLoadImage( argv[1] );
    cvNamedWindow("Example1", CV_WINDOW_AUTOSIZE );

The same kind of thing for video.
#include "highgui.h"
int main(int argc, char** argv){
    cvNamedWindow("Example2", CV_WINDOW_AUTOSIZE);
    CvCapture* capture= cvCreateFileCapture(argv[1]);
    IplImage* frame;
    while (1){
        frame= cvQueryFrame( capture );
        if (!frame) break;
        cvShowImage( "Example2", frame);
        char c = cvWaitKey(33);
        if (c == 27) break;

Camera Input

Using a camera is very similar to using a file. It just needs a camera number instead of a file name. Here’s an example of using a camera if no file is supplied.
#include "highgui.h"
int main(int argc, char** argv){
    CvCapture* capture;
    const char* title;
    if (argc==1) {
        capture= cvCreateCameraCapture(0);
        title= "Camera 0";
    } else {
        capture= cvCreateFileCapture(argv[1]);
        title= argv[1];
    cvNamedWindow(title, CV_WINDOW_AUTOSIZE);
    IplImage* frame;
    while (1){
        frame= cvQueryFrame( capture );
        if (!frame) break;
        cvShowImage( title, frame);
        char c = cvWaitKey(33);
        if (c == 27) break;

Filtered Video Feed With Controls

Here is a program that will open and display a camera feed, some control slider bars, and a feed of the resultant modified video. This allows one to adjust settings in real time to capture the best parameters to use in filtering work. This is handy in isolating particular objects of interest. See the video at the top. Some good parameters for the car (i.e. yellow highligher) are H=27-41, S=58-128, V=199-255.
#include <iostream>
#include <cv.h>
#include <highgui.h>
#include "opencv2/imgproc/imgproc.hpp"

using namespace cv;
using namespace std;

int main(int argc, char* argv[]) {
    // Open video file for reading.
    VideoCapture cap_ob(0);
    if (!cap_ob.isOpened()) {
        cout << "Cannot open the video file." << endl;
        return -1;
    // Create a window for the controls.
    namedWindow("Control", CV_WINDOW_AUTOSIZE);
    // Define control ranges.
    int iLowH= 0; int iHighH= 179; // Hue (0-179)
    int iLowS= 0; int iHighS= 255; // Saturation (0-255)
    int iLowV= 0; int iHighV= 255; // Value (0-255)
    // Create trackbars in Control window.
    cvCreateTrackbar("Hue (min)","Control",&iLowH,179);
    cvCreateTrackbar("Hue (max)","Control",&iHighH,179);
    cvCreateTrackbar("Sat. (min)","Control",&iLowS,255);
    cvCreateTrackbar("Sat. (max)","Control",&iHighS,255);
    cvCreateTrackbar("Val. (min)","Control",&iLowV,255);
    cvCreateTrackbar("Val. (max)","Control",&iHighV,255);

    while (true) {
        Mat imgOriginal;
        // Read new frame from video.
        bool bSuccess=;
        if (!bSuccess) {
            cout << "Cannot read the frame from video file." << endl;
        Mat imgHSV;
        // Convert from BGR to HSV.

        Mat imgFiltered;
        inRange( imgHSV,
                 imgFiltered );
        // Show thresholded image.
        imshow("Filtered Image",imgFiltered);
        imshow("Original Image",imgOriginal);
        // The 10 is msec to wait between frames unless ESC (27) is pressed.
        if (waitKey(10) == 27) {
            cout << "ESC pressed. Bye." << endl;
    return 0;


Look at convexHull which would work well to take the white points and calculate a bounding hull. Also see the bounding box method just above this one.

Actually, look at minarea. The Python demo makes this clearly a candidate.

How to get the centroid of that thing? Maybe there’s a way to get the centroid from a collection of white points.

Is there something that calculates the mean of a collection of points (centroid but called "mean")?

The program is pretty excellent too if the marker can be long and narrow.


Other things that may need including.
#include <iostream>
#include <cv.h>
#include <highgui.h>
#include "opencv2/core/core.hpp"
#include "opencv2/flann/miniflann.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/photo/photo.hpp"
#include "opencv2/video/video.hpp"
#include "opencv2/features2d/features2d.hpp"
#include "opencv2/objdetect/objdetect.hpp"
#include "opencv2/calib3d/calib3d.hpp"
#include "opencv2/ml/ml.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/contrib/contrib.hpp"
#include "opencv2/core/core_c.h"
#include "opencv2/highgui/highgui_c.h"
#include "opencv2/imgproc/imgproc_c.h"


Sorry about this mess. I had started taking a completely new set of notes as I read through the ORA Learning OpenCV book and they were growing quite extensive. And then I moved and I lost the notes! Dang! I just found them again and rather than update this nicely, I’m going to just dump the whole mess here to be sorted out properly later.


A plausible Unix build command line for OpenCV programs.

gcc -v example2_2.cpp \
-I/usr/local/include/ -L/usr/lib/ -lstdc++ -L/usr/local/lib \
-lopencv_highgui -lopencv_core - -o example2_2

Other possible libraries which may be required to link to.

  • -lopencv_imgcodecs

  • -lopencv_imgproc

  • -lopencv_videoio

  • -lopencv_video

  • -lopencv_videostab

LD Path



g++ t.o /opt/lib/ \
    /opt/lib/ \
    /opt/lib/ -o t


For the definitive place to learn more, check out this file: opencv/cxcore/include/cxtypes.h

template class cv:Vec<> - used for small "fixed vector classes". I think it’s stuff like [x,y,z] kind of vectors and not STL type of vectors.

This has typedefs such as * cv::Vec2i - 2 integer member * cv::Vec3i - 3 integer members * cv::Vec4d - 4 double members * cv::Vec{2,3,4,6}{b,w,s,i,f,d} - any of these is valid

The fixed vector template cv::Vec<> is a cv::Matx<> whose number of columns is one.

The same setup is true for cv::Matx<> Also for small fixed 3x3 kind of matrices. Predefined size at compile time. * cv::Matx{1,2,3,4,6}{1,2,3,4,6}{f,d}

cv::Matx33f::all(.5); // Set all 9 (3x3) values to .5.
cv::Matx33f::zeros(); // Set all 9 (3x3) values to 0.
cv::Matx33f::ones(); // Set all 9 (3x3) values to 0.
cv::Matx33f::eye(); // 1,0,0,0,1,0,0,0,1 Identity
cv::Matx33f::randu(min,max); // Uniform random values.
cv::Matx33f::nrandn(mean,var); // Normally distributed random values.
m91f = m33f.reshape<9,1>(); // Cast to different dimensions.
m44f.get_minor<2, 2>( i, j ); // Sub matrix extraction.
m44f.row(i); // Isolate a row. Also `col`.
m44f.diag(); // Diagonal.
m44f.t(); // Tranpose.
m44f.inv(); // Invert.
m1.mul(m2) // m1[0][0]*m2[0][0], etc

Points could use the Vec templates, but they have their own class.

  • cv::Point{2,3}{i,f,d}

Unlike Vec, Points can use myP.x and myP.y, etc.

cv::Scalar is for 4 member doubles.

cv::Size has members width and height rather than x and y but is similar to Point. cv::Rect has all four. There is cv::Size2{i,f} too and cv::Size is really Size2i. There is cv::RotatedRect that contains angle info too. sizeObj.area() is handy; works with Rect too. rectObj.contains(pointObj) to check if point is in ROI.

Point Class

Some things that can be done with point classes.

cv::Point P;
cv::Point3f copyOfP(P);
cv::Point P2(x,y);
cv::Point P3(x,y,z);
float dotProduct=;
double doubleDotProduct= P.ddot(P2);
float crossProduct= P.cross(P2);

There is also a complex number class which is a lot like the STL complex<> template.

Range class

cv::Range class - contains start and end but end isn’t included. So (0,5) means 0,1,2,3,4 but not 5. Method r.all provides the whole list. r.size is the number of items provided (5 here).

Smart pointers

The smart pointer does reference counting for you. Create them like this.

  • cv::Ptr<Matx33f> p( new cv::Matx33f )

  • cv::Ptr<Matx33f> p= makePtr<cv::Matx33f>()

  • cv::Ptr<IplImage> img_p( cvLoadImage("xed.png" ) )

  • cv::Ptr<IplImage> img_p = cvLoadImage( "xed.png" );

The magic here is that if a value of such a pointer object is assigned to another variable, some behind the scenes bookkeeping goes on. For example, if you have a smart pointer a and then you do b=a and then some how get rid of a, it knows that b still needs it. Then if b gets dropped, then it knows nothing cares about the referent and it then calls destructors.

These cv:Ptr objects are like new smart_ptr<> template in fancy new C++ versions. It’s also similar to the Boost shared_ptr<> smart pointer.

Primative Templates

Trailing underscores are generally, but not always, indicate a template in OpenCV. Think of it as "FillInThe_".

These primatives are really just products of templates that can actually be based on whatever types you want. So instead of using ints in a cv::Point type you could use cv::Point_<int> or whatever type you want as long as it supports basic operations. For example, the complex number base could be used. This is mostly good to know to understand error messages and how things work. Beyond that its usage is exotic.

Utility Functions

  • cv::alignPtr() Align pointer to given number of bytes

  • cv::alignSize() Align buffer size to given number of bytes

  • cv::allocate() Allocate a C-style array of objects

  • cv::deallocate() Deallocate a C-style array of objects

  • cv::fastFree() Deallocate a memory buffer

  • cv::fastMalloc() Allocate an aligned memory buffer

  • cvCeil() a Round float number x to nearest integer not smaller than x

  • cv::cubeRoot() Compute the cube root of a number

  • cv::fastAtan2() Calculate two-dimensional angle of a vector in degrees

  • cvFloor() Round float number x to nearest integer not larger than x

  • cvIsInf() Check if a floating-point number x is infinity

  • cvIsNaN() Check if a floating-point number x is “Not a Number”

  • cvRound() Round float number x to the nearest integer

  • cv::format() Create an STL string using sprintf-like formatting

  • cv::CV_Assert() Throw an exception if a given condition is not true

  • CV_Error() Make a cv::Exception macro from a fixed string, throw it

  • CV_Error_() Make a cv::Exception macro from a formatted string, throw it

  • cv::error() Indicate an error and throw an exception

CPU, Threads, and Clock Ticks

  • cv::getCPUTickCount() Get tick count from internal CPU timer

  • cv::getNumThreads() Count number of threads currently used by OpenCV

  • cv::getThreadNum() Get index of the current thread

  • cv::getTickCount() Get tick count from system

  • cv::getTickFrequency() Get number or ticks per second (see

  • cv::getTickCount()

  • cv::setNumThreads() Set number of threads used by OpenCV

  • cv::setUseOptimized() Enables SSE2 and other CPU optimizations

  • cv::useOptimized() What CPU optimazations are in effect


The overwhelming majority of functions in the OpenCV library are members of the cv::Mat class, take a cv::Mat as an argument, or return cv::Mat as a return value; quite a few are and do all three. This class is for "dense" matrices where every value is important. Images are almost always dense. The data that each cell of one of these objects holds can be chosen to suit the purpose. For example, for a mask, it might be bool; for a color image it could be 3 channels of int.

Mat Constructors

cv::Mat; // Plain constructor.
cv::Mat( int rows, int cols, int type ); // Two-dimensional arrays with type.
cv::Mat( const Mat& mat, const cv::Rect& roi ); // ROI initialization.
cv::Mat( const Mat& mat, const cv::Range* ranges ); // ROI with ranges.

Mat Query

Here is an example of a 10x10 identity matrix with 32bit floating point numbers in one channel (grey maybe).

cv::Mat m= cv::Mat::eye( 10, 10, 32FC1 );

Here is a way to access a value at the third row and 3rd column. Should be 1.00000.

printf( "Element (3,3) is %f\n",<float>(3,3));

If you want multichannel, the elements will be of Vec type so you’ll get channels at a certain point like this.<cv::Vec3f>(3,3)[0]

Here is an example that calculates the longest element in a Mat. Actually, what it calculates is the sum of the squares of each channel and not really doing a great job remembering that.

int sz[3] = { 4, 4, 4 };
cv::Mat m( 3, sz, CV_32FC3 ); // Cube 4 positions long.
cv::randu( m, -1.0f, 1.0f );  // Fill random numbers (-1.0 to 1.0).

float max = 0.0f;              // minimum possible value of L2 norm

cv::MatConstIterator<cv::Vec3f> it= m.begin();
while( it != m.end() ) {
  len2 = (*it)[0]*(*it)[0]+(*it)[1]*(*it)[1]+(*it)[2]*(*it)[2];
  if( len2 > max ) max = len2;

Ways to extract subsets of Mat objects. Note that these really create objects which point to the correct locations of the same actual data. To get a new manifestation of the actual data look into the copy function.

  • m.row(r)

  • m.col(c)

  • m.rowRange(start,end) - End might be the end element you want plus one.

  • m.colRange(start,end) - Can also use cv::Range objects.

  • m.diag(offset) - The diagonal of m. I think offset is optional.

  • m( cv::Rect(x,y,w,h) ); - ROI using Rect, can also use two cv:Range objects.

Matrix operations

  • m+n; m-n - addition and subtraction

  • -m - matrix negation

  • m+s; m-s - add or subtract from all elements

  • m * sf - scale by a scale factor

  • m0.mul( m1 ); m0/m1 - per element multiplication and division

  • m*n - matrix multiplication

  • m.inv() - invert matrix (optional methods for fancy people)

  • m.t() - transpose matrix

  • m>n; m>=n; m==n; m<=n; m<n - Per element comparison, returns uchar matrix b&w (0 or 255)

  • m&n; m|n; m^n; ~m - bitwise operations. Also against single values too

  • min(m,n); max(m,n) - Single values work too

  • cv::abs(m) - Absolute value of elements

  • m.cross(n);; - Vector products (cross must be 3x1)

  • cv::Mat::eye( r, c, t ) - Make identity matrix with #rows and #cols of type t

  • cv::Mat::zeros( r, c, t ) - Make new matrix with all zeros

  • cv::Mat::ones( r, c, t ) - Make new matrix with all ones

  • cv::norm() - ? Calculates normals vectors?

  • cv::mean() - Scalar summary statistic, mean.

  • cv::sum() - Scalar summary statistic, sum.

  • n= m.clone() - deep copy a matrix

  • m.copyTo(n) - deep copy a matrix, almost identical to clone()

  • m.copyTo(n,mask) - deep copy a matrix only in masked locations

  • m.convertTo(n,type,scale,offset) - change type, or enlarge,shrink, translate, etc.

  • m.setTo(s[,mask]) - set all or mask ROI to scalar value s

  • m.reshape(channels,rows) - make i by j array something else

  • m.push_back(s) - extend a 1d array with s (append), can tack rows on to 2d objects too

  • m.pop_back(n) - pop n items off the end (note return is weirdly void)

  • m.locateROI(size,offset) - ? region of interest help

  • m.adjustROI(t,b,l,r) - ? region of interest help

  • - count of elements

  • m.elemSize() - size in bytes of each element including all channels

  • m.elemSize1() - size in bytes of each element in each channel

  • m.type() - what type is the Mat holding, int, float, etc

  • m.depth() - number of channels

  • m.size() - size of the whole m (I think)

  • m.empty() - no elements are present.


The cv::Mat is closely organized like C arrays because it is assumed that if you have 100 places, you’ll have 100 data points. But you may have a "sparse" situation where you have 100000 possible locations for 100 data points. The SparseMat class works almost like the Mat, but uses hash lookups instead of contiguous accounting. Some people think this has helpful functionality in vision problems in histograms.

Saturation casting - used to prevent over and underflows when doing math to values. For example if you double some int Mat and the values were already 200, they’ll overflow. In theory, as I understand it, this will set them to 255 and act like no problem.

Note on template vs. inheritance - while the primitive types are derived from their templates, the large array templates are instead derived from the basic class. There is some flexibility here but apparently it’s more proper to get a template to make you a class you need than to use templates for each thing in the class.

Helper Functions

  • cv::abs() - Return absolute value of all elements in an array.

  • cv::absdiff() - Return absolute value of differences between two arrays.

  • cv::add() - Perform element-wise addition of two arrays.

  • cv::addWeighted() - Perform element-wise weighted addition of two arrays (alpha blending).

  • cv::bitwise_and() - Compute element-wise bit-level AND of two arrays.

  • cv::bitwise_not() - Compute element-wise bit-level NOT of two arrays.

  • cv::bitwise_or() - Compute element-wise bit-level OR of two arrays.

  • cv::bitwise_xor() - Compute element-wise bit-level XOR of two arrays.

  • cv::calcCovarMatrix() - Compute covariance of a set of n-dimensional vectors.

  • cv::cartToPolar() - Compute angle and magnitude from a two-dimensional vector field.

  • cv::checkRange() - Check array for invalid values.

  • cv::compare() - Apply selected comparison operator to all elements in two arrays.

  • cv::completeSymm() - Symmetrize matrix by copying elements from one half to the other.

  • cv::convertScaleAbs() - Scale array, take absolute value, then convert to 8-bit unsigned.

  • cv::countNonZero() - Count nonzero elements in an array.

  • cv::arrToMat() - Convert pre–version 2.1 array types to cv::Mat.

  • cv::dct() - Compute discrete cosine transform of array.

  • cv::determinant() - Compute determinant of a square matrix.

  • cv::dft() - Compute discrete Fourier transform of array.

  • cv::divide() - Perform element-wise division of one array by another.

  • cv::eigen() - Compute eigenvalues and eigenvectors of a square matrix.

  • cv::exp() - Perform element-wise exponentiation of array.

  • cv::extractImageCOI() - Extract single channel from pre–version 2.1 array type.

  • cv::flip() - Flip an array about a selected axis.

  • cv::gemm() - Perform generalized matrix multiplication.

  • cv::getConvertElem() - Get a single-pixel type conversion function.

  • cv::getConvertScaleElem() - Get a single-pixel type conversion and scale function.

  • cv::idct() - Compute inverse discrete cosine transform of array.

  • cv::idft() - Compute inverse discrete Fourier transform of array.

  • cv::inRange() - Test if elements of an array are within values of two other arrays.

  • cv::invert() - Invert a square matrix.

  • cv::log() - Compute element-wise natural log of array.

  • cv::magnitude() - Compute magnitudes from a two-dimensional vector field.

  • cv::LUT() - Convert array to indices of a lookup table.

  • cv::Mahalanobis() - Compute Mahalanobis distance between two vectors.

  • cv::max() - Compute element-wise maxima between two arrays.

  • cv::mean() - Compute the average of the array elements.

  • cv::meanStdDev() - Compute the average and standard deviation of the array elements.

  • cv::merge() - Merge several single-channel arrays into one multichannel array.

  • cv::min() - Compute element-wise minima between two arrays.

  • cv::minMaxLoc() - Find minimum and maximum values in an array.

  • cv::mixChannels() - Shuffle channels from input arrays to output arrays.

  • cv::mulSpectrums() - Compute element-wise multiplication of two Fourier spectra.

  • cv::multiply() - Perform element-wise multiplication of two arrays.

  • cv::mulTransposed() - Calculate matrix product of one array.

  • cv::norm() - Compute normalized correlations between two arrays.

  • cv::normalize() - Normalize elements in an array to some value.

  • cv::perspectiveTransform() - Perform perspective matrix transform of a list of vectors.

  • cv::phase() - Compute orientations from a two-dimensional vector field.

  • cv::polarToCart() - Compute two-dimensional vector field from angles and magnitudes.

  • cv::pow() - Raise every element of an array to a given power.

  • cv::randu() - Fill a given array with uniformly distributed random numbers.

  • cv::randn() - Fill a given array with normally distributed random numbers.

  • cv::randShuffle() - Randomly shuffle array elements.

  • cv::reduce() - Reduce a two-dimensional array to a vector by a given operation.

  • cv::repeat() - Tile the contents of one array into another.

  • cv::saturate_cast<>() - Convert primitive types (template function).

  • cv::scaleAdd() - Compute element-wise sum of two arrays with optional scaling of the first.

  • cv::setIdentity() - Set all elements of an array to 1 for the diagonal and 0 otherwise.

  • cv::solve() - Solve a system of linear equations.

  • cv::solveCubic() - Find the (only) real roots of a cubic equation.

  • cv::solvePoly() - Find the complex roots of a polynomial equation.

  • cv::sort() - Sort elements in either the rows or columns in an array.

  • cv::sortIdx() - Serve same purpose as cv::sort(), except array is. unmodified and indices are returned.

  • cv::split() - Split a multichannel array into multiple single-channel arrays.

  • cv::sqrt() - Compute element-wise square root of an array.

  • cv::subtract() - Perform element-wise subtraction of one array from another.

  • cv::sum() - Sum all elements of an array.

  • cv::theRNG() - Return a random number generator.

  • cv::trace() - Compute the trace of an array.

  • cv::transform() - Apply matrix transformation on every element of an array.

  • cv::transpose() - Transpose all elements of an array across the diagonal.

…6. Drawing == Drawing Properties === Colors * Normally 3 channels of cv:Scalar objects * 1st channel used when applied to 1 channel images * 4th channel tolerated but no alpha blending currently supported


Note that "style" is not an OpenCV word per se. * thickness - pixels thick as an integer. Or cv::FILLED for closed shapes. * lineType - Either 4 or 8 or cv::LINE_AA. 4 and 8 are like Minecraft placement where you can either only move orthoganally (4 possibilities) or on diagonals (8 possibilities). AA is anti-aliasing.

Drawing Functions

  • cv::rectangle() - Ordinary rectangle

    void rectangle(
      cv::Mat&          img,                  // Image to be drawn on
      cv::Point         pt1,                  // First corner of rectangle
      cv::Point         pt2                   // Opposite corner of rectangle
      const cv::Scalar& color,                // Color, BGR form
      int               lineType = 8,         // Connectedness, 4 or 8
      int               shift    = 0          // Bits of radius to treat as fraction
    void rectangle(
      cv::Mat&          img,                  // Image to be drawn on
      cv::Rect          r,                    // Rectangle to draw
      const cv::Scalar& color,                // Color, BGR form
      int               lineType = 8,         // Connectedness, 4 or 8
      int               shift    = 0          // Bits of radius to treat as fraction
  • cv::circle() - Ordinary round circles

  • cv::ellipse() - Ellipses, whole or arcs, aligned or tilted, specified with normal details or cv::RotatedRect

  • cv::fillConvexPoly() - Filled versions of simple polygons. Fast. Input points are sequential. Can’t figure 8 or otherwise cross.

  • cv::fillPoly() - filled versions of arbitrary polygons. Can’t see the difference between this and thickness= cv::FILLED.

  • cv::line() - Ordinary line

    void line(
      cv::Mat&          img,                  // Image to be drawn on
      cv::Point         pt1,                  // First endpoint of line
      cv::Point         pt2                   // Second endpoint of line
      const cv::Scalar& color,                // Color, BGR form
      int               lineType = 8,         // Connectedness, 4 or 8
      int               shift    = 0          // Bits of radius to treat as fraction
  • cv::polyLines() - Multiple polygonal curves. Assumed open unless isClosed=true

    void polyLines(
      cv::Mat&          img,                  // Image to be drawn on
      const cv::Point*  pts,                  // C-style array of arrays of points
      int               npts,                 // Number of points in 'pts[i]'
      int               ncontours,            // Number of arrays in 'pts'
      bool              isClosed,             // If true, connect last and first pts
      const cv::Scalar& color,                // Color, BGR form
      int               lineType = 8,         // Connectedness, 4 or 8
      int               shift    = 0          // Bits of radius to treat as fraction
  • cv::clipLine() - Determine if a line is inside a given box

  • cv::ellipse2Poly() - Approximation function for breaking down true mathematical ellipses into something plottable.



void cv::putText(
  cv::Mat&      img,                      // Image to be drawn on
  const string& text,                     // write this (often from cv::format)
  cv::Point     origin,                   // Upper-left corner of text box
  int           fontFace,                 // Font (e.g., cv::FONT_HERSHEY_PLAIN)
  double        fontScale,                // size (a multiplier, not "points"!)
  cv::Scalar    color,                    // Color, RGB form
  int           thickness = 1,            // Thickness of line
  int           lineType  = 8,            // Connectedness, 4 or 8
  bool          bottomLeftOrigin = false  // true='origin at lower left'

All fonts are based on Hershey vector fonts.


Normal size sans-serif


Small size sans-serif


Normal size sans-serif; more complex than cv::FONT_HERSHEY_SIM⁠PLEX


Normal size serif; more complex than cv::FONT_HERSHEY_DUPLEX


Normal size serif; more complex than cv::FONT_HERSHEY_COMPLEX


Smaller version of cv::FONT_HERSHEY_COMPLEX


Handwriting style


More complex variant of cv::FONT_HERSHEY_SCRIPT_SIMPLEX


cv::getTextSize() - calculates how big text would be. Only in Y?

Random Numbers

OpenCV provides special functor objects, one of which provides random number generation services (PCA-Priciple Component Analysis and SVD-Singluar Value Decomposition are others).

Every thread gets its own random number generator object. You can access it like this.

cv::RNG rng = cv::theRNG();

Then make use of it with.

cout << "An integer: " << (int)rng   << endl;
cout << "A float:    " << (float)rng << endl;

Besides matrix methods that incorproate these objects, there are direct methods to do things. Some examples.

  • cv::RNG::uniform

  • cv::RNG::gaussian

  • cv::RNG::fill

HighGUI - In OpenCV3 it is now broken up.

  • imgcodecs.hpp

  • videoio.hpp

  • highgui.hpp (includes imgcodecs.hpp and videoio.hpp)

Also has some XML/YML features for saving parameters and data, etc.

Load Images With Read

cv::Mat cv::imread( const string& filename, int flags);

Default flags is cv::IMREAD_COLOR

three 8-bit channels.







The cv::IMREAD_ANYDEPTH flag can allow greater than 8-bit channels. The flag cv::IMREAD_UNCHANGED combines ANYCOLOR and ANYDEPTH to try to exactly match whatever is in the file no matter what it is.

imread looks at the file itself and does not trust any filename conventions like jpegimagesendwith.jpg.

If the read fails there is no error but the resulting image will be empty. You can check with cv::Mat::empty()==true.

Save Images With Write

bool cv::imwrite(const string& fn, cv::InputArray img, const vector<int>& params=vector<int>());

This takes the file name string and an image array and writes the file. The final parameter is optional and is used to control things like PNG compression (default 3) or JPEG quality (default 95).

Reading Video Streams

This opens a video file.

cv::VideoCapture::VideoCapture( const string& fn,);

This opens a camera device. The first is 0.

cv::VideoCapture::VideoCapture( int device );

I think that a device of 200 is the first V4L (video4linux) device, 201, the 2nd, etc. 300 is the first firewire device. 500 is the first QT device. But 0 is the first of any found (cv::CAP_ANY).

You can check that it opened properly and is ready to go with this.


To read a frame.

bool cv::VideoCapture::read(cv::OutputArray image);

The image will be empty and the return false if the read did not go well or you have come to the end of the file, i.e. last frame. There’s an alias to the >> but that seems more, not less, complex to me. Speaking of complex, the read operation can also be broken down into a cv::VideoCapture::grab and a cv::VideoCapture::retrieve sequence. This can be handy to "grab" from multiple cameras as close to simultaneously as possible, then do more time consuming decoding on the input. This allows triangulation and stereo analysis to minimize error.

Video Metadata

The cv::VideoCapture::get() and cv::VideoCapture::set() functions allow access to the metadata contained in some video file formats. Handy things to query could be…



  • cv::CAP_PROP_FPS

  • cv::CAP_PROP_FOURCC - Four character code of the codec involved


There are many others, not all of which are reliably available for every codec.

Writing Video

First you need a writer object. This gives you a chance to set what the output properties will be like.

cv::VideoWriter::VideoWriter( const string& filename, int fourcc, double fps,
                              cv::Size frame_size, bool is_color=true );

Here are typical settings showing how to update the object too.,

cv::VideoWriter out;"vid.mpg", CV_FOURCC('D','I','V','X'), 30.0, cv::Size( 640, 480 ), true);
if (out.isOpened() == true) { out.write(const Mat& image) }

This creates an MPEG-4 codec, with 30 fps, 640x480 size, expecting only color. The CV_FOURCC function handles bit packing to form the weird codec codes used in this business. As with the read the cv::VideoWriter::write() funciton has an analogue with << which I also think is overly complex. But good to recognize.

Object Serialization And Persistence

OpenCV also has ways to write various non image objects to files in key/value formats. This could be handy for parameter sets. Basically look into this sort of thing.

cv::FileStorage fs("test.yml", cv::FileStorage::WRITE);

This will open an object fs which will write objects as YML. It works something like this.

fs << "Contrast" << mycontrast;

Retrieve the data in these files with something like this. (Didn’t check it.)

mysettingsfile= open("test.yml", int flag);
int contrast;
mysettingsfile["Contrast"] >> contrast;

I am pretty sure XML is supported too.

GUI Windows

GUI windows pretty much are for showing images. OpenCV does not return a window object as one would expect. This is how windows the lifecycle of a window is handled.

int cv::namedWindow( const string& name, int flags );
void cv::imshow( const string& name, cv::InputArray image );
int cv::destroyWindow( const string& name );
void cv::destroyAllWindows( void );

Instead of handling the window with an object, you refer to it by its name string. I think this is designed to be awkward enough so that people don’t start building truly fancy things with OpenCV. The flags can be 0 which means let users resize or it can be cv::WINDOW_AUTOSIZE which conforms to a loaded image automatically.

When the image is applied to the window it gets its own buffer so that changes to the window’s base image are not reflected until a new call to imshow.

int cv::waitKey( int delay );

If and only if a window is successfully open, the waitKey function will wait for a UI keypress event. The delay is the time in milliseconds until it gives up waiting and just proceeds onto the next command (0 means do not time out). If the function does time out it returns a -1.

Here a simple example that shows the file provided as an argument until the user presses escape (character 27). The file name is used to refer to the window and is displayed in its title bar.

int main( int argc, char** argv ) {
    cv::namedWindow( argv[1], 1 );
    cv::Mat = cv::imread( argv[1] );
    cv::imshow( argv[1], img );
    cv::moveWindow( argv[1], 0, 0); // Set upper left to 0,0.
    while( true ) { if( cv::waitKey( 100 ) == 27 ) break; }
    cv::destroyWindow( argv[1] );

For proper operating systems there is also this mysterious command.

int  cv::startWindowThread( void );

This, in theory, allows OpenCV to start a separate thread for a windoow (each extant window?) allowing it to respond better even while other things are going on.

Mouse Events

Working with mouse events requires a callback. The function must look something like this prototype.

void mouse_cb( int event, int x,  int y, int flags, void* param );

Here event and flags are codes refering to these actions and circumstances.

event codes
















The x and y coordinates are where the mouse action happened. I believe it is in the coordinate system of the image matrix, not the entire user desktop, but I’m not 100% sure. I also don’t know why the events and flags seem to have redundant coverage of the mouse buttons. The param argument is a pointer that allows this callback to pass back any kind of thing as a pointer.

Once you have a mouse action to perform ready in the callback function, you need to register that callback to hook it up to the proper window.

void cv::setMouseCallback( const string& windowName,
     cv::MouseCallback mouse_cb, void* param = NULL );


int cv::createTrackbar( const string& trackbarName,
    const string& windowName, int* value,int count,
    cv::TrackbarCallback onChange=NULL, void* param=NULL );
  • trackbarName - Handle used to identify trackbar

  • windowName - Handle used to identify window == NEXT - Chapter 9.2

  • value - Slider starting setting and pointer which will be updated by trackbar events.

  • count - Total counts for slider at far right, or max value.

  • onChange - Callback function (optional)

  • param - Additional params for callback fn (if any).

If you want to use a trackbar event callback, here is the prototype.

void a_tb_cb( int tb_pos, void* param=NULL );

Here are two functions that allow you to query and change the state of the trackbar (supply an identifying window name and trackbar name).

int cv::getTrackbarPos( const string& tb,const string& win);
void cv::setTrackbarPos( const string& tb,const string& win,int pos);


HighGUI basically reinvents functionality for every platform it supports. Another way that’s gaining popularity is to use QT for HighGUI’s backend. This means that the cross platform part is handled by QT developers. Here’s a summary of the situation.

  • Native HighGUI - Custom coded to the architecture.

  • QT backend HighGUI - OpenCV HighGUI extended to architectures with QT backend. To suppress any fancy features of this and emulate classic native HighGUI, add the cv::GUI_NORMAL flag to the cv::namedWindow() call. You may need cv::GUI_EXTENDED for those fancy features. Seems not consistent across platforms.

  • QT - OpenCV does not use its own GUI at all. GUI requirements are all met by QT and OpenCV just manipulates the data.

With extended or normal GUI, if you’re using the QT backend you can also put a text overlay on images. This is the function to use.

int cv::displayOverlay( const string& winname, const string& text, int delay );

Use delay of 0 for indefinite, otherwise the overlay text is temporary.

There is also a "statusbar" for QT backend GUI windows. This is not the title bar. This is just a line of text under the image. I think it kind of wastes space, but here it is.

int cv::displayStatusBar( const string& winname, const string& text, int delay );

Same rules as the overlay.

The QT GUI backend has another fancy feature where every application (not each window) has a "properties" system that can be accessed with the icon that looks (to me) like a whisk broom. Track bars can be put here so they are normally out of the way. To put trackbars into this system, just leave the name blank (i.e. "") when specifying the window to attach the trackbar to in the createTrackbar call.

Unlike native HighGUI, QT HighGUI also features buttons. int cv::createButton( const string& buttonName, cv::ButtonCallback buttonCB = NULL, void* params, int buttonType = int, initialState = 0);

The buttonName will label the button; leave it blank and it will get labeled 0,1,2, etc.

The buttonType is either cv::PUSH_BUTTON, cv::CHECKBOX, or cv::RADIOBOX. The callback that the button press triggers looks like this.

void buttonCB( int state, void* params );

For fancy fonts using QT HighGUI, look into the CvFont object returned from the cv::fontQt() function. Then add fancy text with cv::addText().

QT HighGUI also allows some window properties to be queried and even modified. This seems important when considering windowed mode vs. full screen. See cv::getWindowProperty and cv::setWindowProperty. You can also save and then later reload the state of all your window locations and GUI element settings. See cv::saveWindowParameters and cv::loadWindowParameters. The magic of how this happens is opaque to the user/programmer, but note that it depends on the executable name so if you change that, be prepared.

QT HighGUI also can register a callback to draw OpenGL buffers into the OpenCV GUI elements. Do this with cv::createOpenGLCallback.

Image Processing

Image Border Padding

To apply a NxN kernel to an image it’s clear that at certain edge pixels N/2+1 of the kernel will hang off the edge. This can be handled by cropping down the final image. Or you can pad out the original image with a synthetic border. There are many modes in OpenCV to do this.

void cv::copyMakeBorder( cv::InputArray srcimg, cv::OutputArray dstimg, int t_pad, int b_pad, int l_pad, int r_pad, int borderType, const cv::Scalar& value=cv::Scalar() );

The value is for constant (e.g. all black or white or greenscreen) borders. The borderType can be one of the following.

Table 1. borderType


Soild color frame effect.




Use value of edge pixel. (Corner?)


Frame is a reflection of edge. 654321 becomes 456-654321-123


Reflect without double wide edge pixel. 345-654321-234


Same as cv::BORDER_REFLECT_101.


To selectively apply a filter based on some kind of single per pixel operation, the cv::threshold function can do that. It takes a source and destination array and the threshold values.

double cv::threshold( cv::InputArray S, cv::OutputArray D,
                      double threshValue, double maxValue, int thresholdType  );
Table 2. thresholdType


Dij = (Sij > threshValue) ? maxValue : 0


Dij = (Sij ⇐threshValue) ? maxValue : 0


Dij = (Sij > threshValue) ? Sij : 0


Dij = (Sij ⇐threshValue) ? Sij : 0


Dij = (Sij > threshValue) ? threshValue : Sij

The threshValue can also be cv::THRESH_OTSU. This will make the threshold() function try to find the optimal threshold value. This maximizes the variance between the separated sets of pixels. This is a relatively slow function.

There is also the adaptive threshold function. This is the same technique that boosts the performance so much in findChessBoardCorners.

void cv::adaptiveThreshold( cv::InputArray src, cv::OutputArray dst,
     double maxValue, int adaptiveMethod, int thresholdType int blockSize,
     double Constant );

The threshold value is computed on the fly for each pixel based on a weighted average of a (blockSize2 - Constant) region around it. The adaptiveMethod parameter is either "mean" or "Gaussian" (cv::ADAPTIVE_THRESH_MEAN_C or cv::ADAPTIVE_THRESH_GAUSSIAN_C). Gaussian gives wieghted values while mean is constant for that region.


There are at least 5 different kinds of smoothing operation in OpenCV. The simplist is cv::blur().

void cv::blur( cv::InputArray  src, cv::OutputArray dst,
           cv::Size kernelSize, cv::Point anchorPt=cv::Point(-1,-1),
           int borderType=cv::BORDER_DEFAULT);

This just takes a kernelSize kernel and computes the mean value within it. This is the value assigned to the new image at the location the kernel was centered on. Since the kernel is rectangular this is a generalization of a "box filter". The cv::boxFilter() function is a more general case of cv::blur which allows for unnormalized mode and an explicit control over output depth (blur reasonably assumes normalizing and same depth).

Here is another similar kind of blur which uses the median of the kernel’s contents rather than the mean as with simple blur. It seems to me to lose most detail, but if you want major component segmentation, it migth be ideal.

void cv::medianBlur( cv::InputArray src, cv::OutputArray dst, cv::Size kernelSize );

Ths GaussianBlur is a much nicer looking function which produces results that are what you’d expect with wearing the wrong glasses. Detail can be preserved while noise can be muted.

void cv::GaussianBlur( cv::InputArray  src, cv::OutputArray dst, cv::Size kernelSize,
                       double sigmaX, double sigmaY=0.0, int borderType=cv::BORDER_DEFAULT);

The sigmaX and sigmaY are "Gaussian half-width" in each direction. What that means is vague but I assume it has to do with the shape of the Gaussian probability distribution used, perhaps related to the variance. If you leave it at 0, it will, in theory, used optimized code. Also 3, 5, and 7 square kernels get special hand tuned opmization boosts. See Image Border Padding for how to handle borderType.

The other type of smoothing is bilateral filtering. You can think of this as Gaussian smoothing that weighs similar pixels more highly than less similar ones, keeping high-contrast edges sharp.

void cv::bilateralFilter( cv::InputArray src, cv::OutputArray dst,
      int d, double sigmaColor, double sigmaSpace,
            int borderType=cv::BORDER_DEFAULT );

The d is distance to consider (sort of like a kernel, but I think it really computes expensive distances). Some sources call this a "diameter" which is a different thing but I guess it has the same rough effect when adjusting it. The sigmaColor is like the sigmas for Gaussian blur. The sigmaSpace affects what will be included in the smoothing; the larger it is, the more extreme a discontinuity must be to be preserved.

Derivatives and Gradients

A derivative is the rate of change of a process. So if I have a value of 9 and then 9 and 9 and 9, the derivative is 0 because it’s not changing at all. If I have a value of 3, then 6, then 9 then 12, the derivative is 3. This is of interest in image processing because if you have pixels with the following values, [7,9,8,10,9,54,53,54,53,55], it’s easy to see that in the middle of that sequence something important happened. In this sequence the derivatives are [2,-1,2,-1,45,-1,1,-1,2] and by plotting this, you get a representation of the edges. This is a simplification; in practice the derivative functions often will be fitting polynomials and other fancy things in the internals.


The least fancy edge detection is the Sobel derivative.

void cv::Sobel( cv::InputArray, cv::OutputArray, int ddepth,
  int xorder, int yorder, cv::Size kernelSize=3,
  double scale=1, double delta=0, int borderType=cv::BORDER_DEFAULT);

The ddepth can change to output’s format (to CV_8U) for example) which is handy since your image of the edges doesn’t need all the color info of the original. The kernel size should be odd but not exceed 31. The order is the derivative order, presumably 1st derivative (1) for normal gradients. Second order (2) is also an option and even zero, though I don’t know what that means exactly in this context. Scale and offset are applied to values before calculation to help put the result in a usable (visible maybe) range.

Scharr and Laplacian Variants

The Sobel filter has better performance the closer the edge aligns with horizontal or vertical. To help with that, for example to improve the accuracy of histogram of oriented gradients approaches, you can use the special kernelSize parameter of cv::SCHARR. This will create a (fast) 3x3 kernel that is more balanced for edges in any oreintation.

The cv::Laplacian() function (not to be confused with the Laplacian pyramid) is basically the cv::Sobel() with order set to 2. This can be useful for edge detection. It looks like it favors strong distinctions and ignores milder gradient changes.


This kind of operation is important for removing noise and clarifying object definition. The two basic transforms are dilation and erosion. Dilation adds active pixels around currently active pixels while erosion does the opposite. Dilation tends to fill holes and concave features while erosion tends to smooth out jagged protrusions. Both functions take basically the same kinds of parameters.

void cv::dilate/cv::erode( cv::InputArray src, cv::OutputArray dst, cv::InputArray kernel, cv::Point anchor=cv::Point(-1,-1), int iterations=1, int borderType=cv::BORDER_CONSTANT, const cv::Scalar& borderValue=cv::morphologyDefaultBorderValue() );

Both src and dst can be the same. The kernel supplied with an uninitialized cv::Mat() will give the default 3x3. These functions are really just different directions of focus, either on lightening or darkening.

These morphology functions are usually fine for 1 bit image masks. For fancier applications on more complex image arrays, you might want a more complex general morphology functionality. This is the purpose of the cv::morphologyEx() function. Some odd terminology in this topic is "top hat" and "black hat" which basically are techniques to isolate patches that are brighter or dimmer, respectively, mostly in grayscale work.. You can also make your own wierd fancy kernel shapes using cv::getStructureElement but that’s pretty exotic. In fact, you can delve into the internals of these 2d filters (cv::filter2D()) and use subcomponents of OpenCV’s mainstream functions to compose your own optimized solution.

Image Transforms

Probably the most essential general image transform is cv::resize which does the obvious.

void cv::resize( cv::InputArray src, cv::OutputArray dst, cv::Size dsize,
  double fx=0, double fy=0, int interpolation=CV::INTER_LINEAR);

The fx and fy are ignored if set to 0, but if dsize is set to cv::Size(0,0) and these scale factors are nonzero, then the image is scaled accordingly. The interpolation method can be one of these.


Nearest neighbor




Pixel area resampling


Bicubic interpolation


Lanczos interpolation over 8 × 8 neighborhood.

This function creates a new image with the desired dimensions. There is also a confusingly named cv::Mat::resize() function that can operate on matrices (and therefore images). This function does operate in place and does no form of interpolation.

Image Pyramids

OpenCV seems to provide pretty comprehensive support for a technique called "image pyramids". This is basically a stack of reduced images. Some thoughts about it.

  • The base layer is G0 and is the original image.

  • The next layer is G1 and is, ordinarily, half the size of G0.

  • The G comes from a Gaussian convolution used to achieve the downsample.

  • There is a Laplacian pyramid which seems to contain the information needed to reverse the downsample process.

Functions involved in this.

Make the next downsample image to build your own pyramid.

void cv::pyrDown( cv::InputArray  src, cv::OutputArray dst,
                  const cv::Size& dstsize = cv::Size());

Make them all in one go.

void cv::buildPyramid( cv::InputArray src, cv::OutputArrayOfArrays dst,
                       int nofpyrlvls  );

The cv::OutputArrayOfArrays seems to be like an STL vector<cv::OutputArray> or perhaps even vector<cv::Mat>. There is a cv::pyrUp which uses the Laplacian stuff.

Geometric Transforms

An affine transform is one where a rectangle can be turned into any kind of parallelogram. This can be done with a 2x2 transformation matrix. The other kind of transform is perspective which requires a 3x3 matrix and can produce any 4 sided shape from a rectangle.

Affine trans

To be continued…