Cumulus Notes

Other Options

Tarsnap looks very similar.

What Is Cumulus

Cumulus is a hardcore tool for safely managing data. It was written by my friend Mike Vrable which is why it solves more problems than you’re likely to ever be aware of. Fundamentally it is a way to encode incremental changes to the state of a filesystem in an efficient, convenient, secure, and transparent way.

The classic use case is keeping a back up of a file system that is in use. Cumulus will start by taking all of the files of the file system and rearranging them into convenient files of roughly the same size. These intermediate containers of data can be encrypted easily making the data that goes to any back up scheme completely private. After the first containerization, subsequent snap shots will produces containers containing only the differences. Everything is done in a space efficient way. This means that this is an ideal way to package your data for sending to a cloud data storage (hence the name) like Amazon S3. In theory you could send the container files to a gmail account.

Set Up

Installing

First find its current web site. Down load and unpack it. I just ran make on it and it all compiled smoothly into an executable called cumulus.

Directories

You need to set up some directories for Cumulus to use. You need a main one that is where important data goes and you need less important directory where the list of what is in the archive is kept. This is for efficiency and can be recomputed in an emergency, so it’s not critical to safeguard this with too much effort. The two directories should look something like this:

$ mkdir -p backup/cumulus backup/cumulus.db

Existing Data Block Database

Cumulus maintains a local index of data blocks that have been previously stored for constructing incremental snapshots. This is stored as a comprehensible sqlite database. To get started you need to set up the schema. Use the schema.sql file that comes with Cumulus.

$ sqlite3 cumulus.db/localdb.sqlite ".read ../cumulus-0.9/schema.sql"

Did that work? This will interact with the localdb.sqlite file just to see if it is somewhat ok.

$ sqlite3 cumulus.db/localdb.sqlite ".schema block_index"

Running

Here is an example of running Cumulus without any special filter. The default filter is bzip2 -c.

    cumulus --dest=/cumulus --localdb=~/xed/backup/cumulus.db --scheme=xedtest \
         /home/xed/xfile/project/programming

Mike’s Recommended Launcher

     #!/bin/sh
     export LBS_GPG_HOME=/cumulus.db/gpg
     export LBS_GPG_ENC_KEY=<encryption key>
     export LBS_GPG_SIGN_KEY=<signing key>
     cumulus --dest=/cumulus --localdb=/cumulus.db --scheme=test \
         --filter="cumulus-filter-gpg --encrypt" --filter-extension=.gpg \
         --signature-filter="cumulus-filter-gpg --clearsign" \
         /etc /home /other/paths/to/store

Simple Execution

~/xfile/project/programming/c/cumulus/cumulus-0.9/cumulus -v \
--dest=/home/xed/backup/cumulus --localdb=/home/xed/backup/cumulus.db \
--scheme=xedtest /home/xed/public_html/test

Cloud Crap

Maybe you want to put some clever archive in an off-site place. Here are some notes about cloud things in general.

Google Cloud Storage Buckets

I have used Google Cloud storage but would be hesitant to do so with my own money. Here are my notes on how I did that.

Credentials

Put a json key here.

export GOOGLE_APPLICATION_CREDENTIALS=/home/xed/.google_cloud_key.json

Mount Bucket

Use a fuse thing to mount.

gcsfuse segmentation-dataset /home/xed/goo/

I think that the gcsfuse program is the big hint there. Official gcsfuse Github site. Because of course you can trust a giant tech company which hosts its own software on its competitor’s web site.

Log In

Clear any stupidity with SSH agents.

export SSH_AUTH_SOCK=0

This gets the whole log in rolling, generating, and planting keys, etc.

gcloud compute --project "learned-grammar-169922" ssh --zone "us-west1-b" "tensorflow-1-vm"

Here is the more proper way.

ssh -vl xed -i /home/xed/.ssh/google_compute_engine 123.66.123.66