Partitioning disks is a confusing business. There is much about the topic that I don’t understand and, if we’re lucky, technology can move right along making the understanding of such things irrelevant. But for now, understanding a few of the underlying concepts can make dealing with partitioning problems much easier.

A hard drive is a physical device that contains a number. That number is quite long, often 1 terabit these days. So there is no magical dots or divots on the physical hard drive surface. It is just a number. The nice thing about a hard drive is that you can change the digits of the number any place it is convenient: 31415926535 can become 31010920505 easily enough.

It is actually possible to take a number and just send it directly to an unformatted hard drive. You can even get that number back. The operating system kernel should have low level hardware drivers designed to physically write stuff to the device. In Linux, those drivers make themselves available by the kernel in the form of a "device". For example, a hard drive might be /dev/hda. This represents a direct connection to the raw bit reading/writing capabilities of this device. You could, for example, do this (but don’t!):

dd if=/dev/urandom of=/dev/hda bs=1000 count=1

This would fill the first 1000 bytes of the output file, your hard drive, with the first 1000 bytes of the input file. In this case, the input file is another fake file that the kernel offers which just constantly produces random numbers. In summary, this is an excellent way to toast your whole system. You’d be playing with the low level workings of what gets stored to the hard drive. This is not an easy thing to manage and indeed there are other ways.

Why File Systems?

First let’s look at why it’s not easy to manage. You may think that if you have a file that it would be ok to send it to a spare hard drive like this:

dd if=/home/joe/secretlist.txt of=/dev/hdb

This would indeed store the file secretlist at the beginning of hdb. But what if you want that back? You can’t just say:

dd of=/home/joe/secretlist.txt if=/dev/hdb

The reason is that the dd program (which moves bytes around without asking too many questions) won’t have a clue where to stop. In this case, it would make a secretlist.txt file that is exactly the size of the entire hard drive b.

Ah, so this is why we need file systems. With a file system, I can specify that I’d like file secretlist and the file system driver will do a lot of work to, in effect, come up with a command like this:

dd of=/home/joe/secretlist.txt if=/dev/hdb bs=5361 count=1 seek=3230483

The details are unimportant, but this is a dd command that is pulling a very specific subset of data off of a very specific location on the disk. So filesystems are important in order to make practical use of your storage resources.

Why Partitions?

Then we take a step back and ask why are partitions important? Couldn’t a file system be created on the whole disk? Yes it could. As a backup drive, this might work out. But for normal situations, there is just a convention that drives are partitioned. To whom is this convention important? The most important player is your BIOS. When you power on your hardware, the BIOS needs to load some stuff into memory and get the ball rolling. BIOS usually depends on a hard drive having a certain organization. Specifically, the BIOS is counting on the first sector of the hard drive to contain the Master Boot Record (MBR) which will give clues about what to do next. If you’ve gone and put a file system on the raw disk, then the MBR will just be confusing to the BIOS.

So partitions are practically required if not theoretically. The reason such a system was setup was because it was an all or nothing proposition. Either the BIOS would understand a partitioning scheme or it would understand a single monolithic file system on the raw device. With the former, you can achieve the latter, but not the other way around.

Why would anyone want partitions? There are lots of reasons.

  • Perhaps you want different file system types (FAT, ext3, swap, vendor utility) on one physical disk.

  • Perhaps you want different operating systems on one physical disk.

  • Perhaps you want to be able to segregate data from the activities related to some other part of the disk.

For many people none of these reasons are important, but the system is universally set up like this just in case the need arises.

Partioning Strategy

How does partitioning work? Let’s say you start with a blank hard drive; this will clear your whole hard drive to zeros:

dd if=/dev/zero of=/dev/hda

Now lets say you want two filesystems on a harddrive which is 50 bytes. This is one way to do it:

0000000000000000000000X000000000000000000000000000

The first partition is everything up to the X and the second is after. But this means that you can’t do this:

00Chris X Edwards00000X000000000000000000000000000

We could choose something that we would never use:

00Chris X Edwards00000~000000000000000000000000000

but with computer data, it’s difficult to count on never seeing a magical combination of characters (at the very least, bad guys would synthesize it).

A better way is to index it:

23Partition 1 is here!Partition Two is here0000000

Here is a (massively simplified) model for an indexed disk with two partitions. The first 2 bytes specify the offset of where the 2nd partition begins. What if that was the system I was given to work with, but I wanted 3 partitions? One scheme would be to realize that since the numbers between 50-99 will always go unused. We can add 50 to our offset to signal something special. What will this signal? We can signal that when the BIOS finds an offset like this that it should go there (minus the 50 which is just a signal) and then treat that location like a Master Boot Record.

73Partition 1 is here!10Part. 2!Partition Three000

Of course we could keep going:

73Partition 1 is here!60Part. 2!08Part-3Part. 4000

Clearly this is confusing. And that’s life. It turns out that normal PC hard drives work using a sort of compromised technique. Instead of an infinite regression into special indexes, the MBR index can contain a reference to one special partition which will be an "extended" partition. This partition contains all the information about a more complex setup. Once you send the BIOS to an Extended partition, things can get arbitrarily complex with no problems. So once again, here’s a system with two simple primary partitions:

23Partition 1 is here!Partition Two is here0000000

And here’s a complex system with the extended partition concept:

64Primary one 061218Log. 1Log. 2Logigcal Three0000

Now these simple systems I’m showing have had one slot for designating a partition index. In real life, there are four. Don’t ask why. There are four. These are your primary partitions. You can establish them as primary partitions or you can flag one (and only one) to be an extended partition. If you make an extended partition, then it can be filled with any number of "logical partitions".

Back to more useful diagrams that might actually represent a real hard drive layout; here’s a simple one:

|MBR|  Primary 1     |   Primary 2               |

Still simple:

|MBR|  Primary 1 | Primary 2 | Primary 3 | P 4   |

The big jump to complex:

|MBR|  Primary 1 | [       The Extended*           ] |
  • Extended contains: [ | Logical 1 | Logical 2 | L 3 | L4 | L5 | ]

    |MBR|  Primary 1 | [ | Logical 1 | Logical 2 | L 3 | L4 | L5 | ] |

Usually if you want 5 partitions you would do this:

|MBR|  P1 | P2 | P3 | [ | L1 | L2 | ] |

As if this weren’t confusing enough, it gets even more weird when you consider the numbering. This is how the partitions are labeled:

|MBR|  hda1 | hda2 | hda3 | [ | hda5 | hda6 | ] |

Where is hda4? It turns out that hda4 is the "extended partition". It never gets accessed itself, but to the OS, it is a real partition. In theory, one could wipe out all logical drives in one go with:

dd if=/dev/zero of=/dev/hda4

BIOS Issues

There are some subtle things to consider about partitions. First, it’s the job of the BIOS to figure out the partitioning scheme and start the computer processing the right stuff. Therefore if your partitioning is suitably exotic, you might have problems with your BIOS. For example, if I want my system to boot from logical drive 7 on a SATA based system, maybe its BIOS can’t figure out how to do that. You’re always safe booting from primary partitions.

Proprietary operating systems only barely understand partitioning. Proprietary operating systems assume that they will be the only thing on your drive and will often actively seek out and destroy without warning other partitions it finds. It’s best to avoid proprietary operating systems. If you can not do that, install them first in their own primary partition (number 1 is best) and let your better operating systems politely work around them.

What is LVM?

Linux and other sensible operating systems have been hard at work trying to get around this whole confusing mess. In doing so, another confusing mess has been created. This is the Logical Volume Management or LVM. The nice thing about this mess is that while it introduces complexity and severely reduces compatibility, it provides excellent flexibility in partition management. With LVM your partitions live in a different layer (inside a primary partition generally) that has a lot more information and flexibility. So instead of this:

|MBR|  P1 | P2 | P3 | [ | L1 | L2 | ] |

You might have one primary partition like this:

|MBR| { | LVM1 | LVM2 | LVM3 | LVM4 | LVM5 | } |

Here the partitions are easily edited and worked with even after they’re filled with data.

Logical Volume Management is growing in popularity and it seems reasonable that such a system will eventually replace the very old fashioned BIOS-oriented scheme for common use. Hopefully this has helped provide a bit of theoretical background to explain why disk partitioning is so clumsy looking.