Options

-w

confirmation at every step.

-f tarfile.tar

this option specifies the archive file or device.

Note
The archive file must immediately follow the -f and it’s good form to end the file name with .tar

Compression/Decompression

-z

run it through the gzip program; you must add the .gz manually.

-Z

run it through the compress program; you must add the .Z manually. Perhaps if this is not on your Linux system explore the ncompress package.

-j

run it through the bzip2 program; you must add the .bz2 manually.

-J

run it through the xz program; you must add the .xz manually. Newer Linux kernels like this compression scheme.

Note
It is also good form to archive things in a subdirectory so they won’t explode all over your $PWD when you unpack it.

Bundling Up Many Files Into One

Here is an example of making a tar archive of files without a subdirectory. This is just a file comprised of other files with no directory structures. In this example I have several sound files that I want to archive and deal with as only one file. This makes a single file, noise.tar out of any and all .wav files. Make sure there are no directory structures that match *.wav!

tar -cvf noise.tar *.wav

This setup adds automatic compressing and confirmations.

tar -cvwzf noise.tar.gz *.wav

If you have an entire directory structure that you’d like to archive, use something like this from the directory you want to be the top level:

tar -cvzf package.tgz *

If for some reason you want more control over the zipping (say piping to a special filter) use something like this:

tar -cvO theprojdir | gzip -c > theprojdir.tgz

This assumes you’re in the parent directory of the directory theprojdir.

Errors With DOS

Even as late as 2018, 128GB flash drives are formatted with a file system, vFAT, which has a maximum file size of 4GB. This means that when using tar to make a back up of something that you want normal people to be able to access, you might get this error.

gzip: stdout: File too large

One answer is to use NTFS which the unfortunate masses can generally use these days. Or use split. Something like this might do the trick.

tar cvzf - dirofinterest/ | split --bytes=4294967295 - dirofinterest.tar.gz.

Here’s SE on the topic.

Here’s an important factor - if you want your archive to consist of symlinks just as they are in the filesystem, then do nothing different. If, however, you are making an archive of things that borrow from other places and you’d like to have the information that is linked become "solid" you can use the -h parameter to replace symlinks with the actual file they point to. Here’s an example:

tar -chvf coolstuff.tar ./*

In this case, if the current directory contained references to other places, this archive will actually contain the information referenced.

Unpacking Many Files From One

tar -xvf noise.tar

or for prompting (w) and gunzipping (z):

tar -xvwzf noise.tar.gz

Unpacking One File From An Archive Of Many

Simply specify its exact path as known to the tar archive. You can find this out with the -t option. For example, imagine this situation.

$ tar -tvzf chembl_22_1_mysql.tar.gz
0 2016-11-14 06:23 chembl_22_1_mysql/
9044737177 2016-11-14 04:09 chembl_22_1_mysql/chembl_22_1_mysql.sql
1001 2016-11-14 06:23 chembl_22_1_mysql/INSTALL

You want to feed that sql file into a database. To just unpack the sql file (and it will need to go into a ./chembl_22_1_mysql directory) do this.

$ tar -tvzf chembl_22_1_mysql.tar.gz chembl_22_1_mysql/chembl_22_1_mysql.sql

Use A File Without Extracting

In the previous chembl example, maybe you don’t have room to unpack that huge stupidly packed SQL file. Maybe it’s over a network and your MySQL server is too. You don’t need to actually physically unpack the whole archive or any of it to send the contents of a file somewhere.

Use the -O option to send the contents of the extracted files to standard output instead of writing the files to the file system.

tar -xvOzf chembl_22_1_mysql.tar.gz chembl_22_1_mysql/chembl_22_1_mysql.sql \
| mysql -u sqluser --password="${PW}" -h localhost chembl

And if that’s not quite working out, there’s even a fancy tar option --to-command=_cmd_ which can access all kinds of properties about the tar archive so your target command can use them. See the official documentation for more info.

Adding Files to a Tar Archive

To add the file cuckoo.au to the noise.tar archive:

tar -rvf noise.tar cuckoo.au

To add a file to a gzipped archive:

gunzip noise.tar.gz ; tar -rvf noise.tar cuckoo.au ; gzip noise.tar

Checking the Contents of a Tar Archive

To see what files are contained in noise.tar use:

tar -tf noise.tar

To see what files are contained in noise.tar.gz use:

tar -tzf noise.tar.gz

Using tar And ssh To Transfer Files

This is handy if rsync isn’t going to be available for some reason.

This takes all files in the current directory (recursive) and puts them under /mnt/clone on the target host.

tar -cjf - * | ssh target.host.com tar -C /mnt/clone -xjf -

Sometimes you inherit a gazillion files in a directory and need to move them somewhere sensible but there are so many that problems with the argument list become annoying. Here’s how to get a directory of a zillion files on a remote machine into a local tar file.

$ ssh filestorm.example.edu tar -czf - /home/xed/toomany | pv > /tmp/toomany.tgz

Note the optional use of pv, pipe viewer.

Automatically Pack Large Collections Of Files

#!/bin/bash
# The purpose of this script is to package up data into
# sensible archive files. This allows the storage system
# to conserve inodes (fewer files) and space (much
# smaller files). The general sequence of operations is as
# follows:
#  * Find directories in `data/received` older than N days (generally 7).
#  * Create a tar.bz2 file for it in `data/archived`.
#  * Check that the archive matches the data.
#  * If so, delete the original data.
#  * If not, delete the archive (try again tomorrow?).

OLDERTHAN=7
BASEDATADIR=/home/user/data
RECDIR=${BASEDATADIR}/received
ARCDIR=${BASEDATADIR}/archived
LOGFILE=/home/user/data/health-report

function log {
    return # Comment out to log. Return for silence.
    date >> ${LOGFILE}
    echo $1 >> ${LOGFILE}
}

log "Searching for files to pack..."
cd ${RECDIR}
find  -mtime +${OLDERTHAN} -and -type d -printf '%f\n' | \
while read D
do
    DTBZ=${ARCDIR}/${D}.tar.bz2
    log "Creating the tarfile: ${DTBZ}"
    tar --create --bzip2 --file=${DTBZ} ${D}
    log "Created the tarfile: ${DTBZ}"
    log "Check the archive's fidelity."
    if tar --diff --bzip2 --file=${DTBZ} --directory=${RECDIR} ${D}
    then
        # Remove original files if archive is ok.
        rm -r ${RECDIR}/${D}
        log "Removed: ${RECDIR}/${D}"
    else
        # Remove the incorrect archive, keeping originals.
        log "Problem creating archive for ${RECDIR}/${D}"
        log "Consider removing bad archive ${DTBZ}"
        #rm ${DTBZ}
    fi
done
log "$0 finished running."