Options
-
-w
-
confirmation at every step.
-
-f tarfile.tar
-
this option specifies the archive file or device.
NoteThe archive file must immediately follow the -f and it’s good form to end the file name with .tar
Compression/Decompression
-
-z
-
run it through the
gzip
program; you must add the .gz manually. -
-Z
-
run it through the
compress
program; you must add the .Z manually. Perhaps if this is not on your Linux system explore thencompress
package. -
-j
-
run it through the
bzip2
program; you must add the .bz2 manually. -
-J
-
run it through the
xz
program; you must add the .xz manually. Newer Linux kernels like this compression scheme.
Note
|
It is also good form to archive things in a subdirectory so they won’t explode all over your $PWD when you unpack it. |
Bundling Up Many Files Into One
Here is an example of making a tar archive of files without a subdirectory. This is just a file comprised of other files with no directory structures. In this example I have several sound files that I want to archive and deal with as only one file. This makes a single file, noise.tar out of any and all .wav files. Make sure there are no directory structures that match *.wav!
tar -cvf noise.tar *.wav
This setup adds automatic compressing and confirmations.
tar -cvwzf noise.tar.gz *.wav
If you have an entire directory structure that you’d like to archive, use something like this from the directory you want to be the top level:
tar -cvzf package.tgz *
If for some reason you want more control over the zipping (say piping to a special filter) use something like this:
tar -cvO theprojdir | gzip -c > theprojdir.tgz
This assumes you’re in the parent directory of the directory
theprojdir
.
Errors With DOS
Even as late as 2018, 128GB flash drives are formatted with a file system, vFAT, which has a maximum file size of 4GB. This means that when using tar to make a back up of something that you want normal people to be able to access, you might get this error.
gzip: stdout: File too large
One answer is to use NTFS which the unfortunate masses can generally use these days. Or use split. Something like this might do the trick.
tar cvzf - dirofinterest/ | split --bytes=4294967295 - dirofinterest.tar.gz.
Here’s SE on the topic.
Symbolic Links
Here’s an important factor - if you want your archive to consist of
symlinks just as they are in the filesystem, then do nothing different.
If, however, you are making an archive of things that borrow from other
places and you’d like to have the information that is linked become "solid"
you can use the -h
parameter to replace symlinks with the actual
file they point to. Here’s an example:
tar -chvf coolstuff.tar ./*
In this case, if the current directory contained references to other places, this archive will actually contain the information referenced.
Unpacking Many Files From One
tar -xvf noise.tar
or for prompting (w) and gunzipping (z):
tar -xvwzf noise.tar.gz
Unpacking One File From An Archive Of Many
Simply specify its exact path as known to the tar archive. You can
find this out with the -t
option. For example, imagine this
situation.
$ tar -tvzf chembl_22_1_mysql.tar.gz
0 2016-11-14 06:23 chembl_22_1_mysql/
9044737177 2016-11-14 04:09 chembl_22_1_mysql/chembl_22_1_mysql.sql
1001 2016-11-14 06:23 chembl_22_1_mysql/INSTALL
You want to feed that sql file into a database. To just unpack the
sql file (and it will need to go into a ./chembl_22_1_mysql
directory) do this.
$ tar -tvzf chembl_22_1_mysql.tar.gz chembl_22_1_mysql/chembl_22_1_mysql.sql
Use A File Without Extracting
In the previous chembl example, maybe you don’t have room to unpack that huge stupidly packed SQL file. Maybe it’s over a network and your MySQL server is too. You don’t need to actually physically unpack the whole archive or any of it to send the contents of a file somewhere.
Use the -O
option to send the contents of the extracted files to
standard output instead of writing the files to the file system.
tar -xvOzf chembl_22_1_mysql.tar.gz chembl_22_1_mysql/chembl_22_1_mysql.sql \
| mysql -u sqluser --password="${PW}" -h localhost chembl
And if that’s not quite working out, there’s even a fancy tar option
--to-command=_cmd_
which can access all kinds of properties about
the tar archive so your target command can use them. See
the
official documentation for more info.
Adding Files to a Tar Archive
To add the file cuckoo.au to the noise.tar archive:
tar -rvf noise.tar cuckoo.au
To add a file to a gzipped archive:
gunzip noise.tar.gz ; tar -rvf noise.tar cuckoo.au ; gzip noise.tar
Checking the Contents of a Tar Archive
To see what files are contained in noise.tar use:
tar -tf noise.tar
To see what files are contained in noise.tar.gz use:
tar -tzf noise.tar.gz
Using tar And ssh To Transfer Files
This is handy if rsync isn’t going to be available for some reason.
This takes all files in the current directory (recursive) and puts them
under /mnt/clone
on the target host.
tar -cjf - * | ssh target.host.com tar -C /mnt/clone -xjf -
Sometimes you inherit a gazillion files in a directory and need to move them somewhere sensible but there are so many that problems with the argument list become annoying. Here’s how to get a directory of a zillion files on a remote machine into a local tar file.
$ ssh filestorm.example.edu tar -czf - /home/xed/toomany | pv > /tmp/toomany.tgz
Note the optional use of pv
, pipe viewer.
Automatically Pack Large Collections Of Files
#!/bin/bash # The purpose of this script is to package up data into # sensible archive files. This allows the storage system # to conserve inodes (fewer files) and space (much # smaller files). The general sequence of operations is as # follows: # * Find directories in `data/received` older than N days (generally 7). # * Create a tar.bz2 file for it in `data/archived`. # * Check that the archive matches the data. # * If so, delete the original data. # * If not, delete the archive (try again tomorrow?). OLDERTHAN=7 BASEDATADIR=/home/user/data RECDIR=${BASEDATADIR}/received ARCDIR=${BASEDATADIR}/archived LOGFILE=/home/user/data/health-report function log { return # Comment out to log. Return for silence. date >> ${LOGFILE} echo $1 >> ${LOGFILE} } log "Searching for files to pack..." cd ${RECDIR} find -mtime +${OLDERTHAN} -and -type d -printf '%f\n' | \ while read D do DTBZ=${ARCDIR}/${D}.tar.bz2 log "Creating the tarfile: ${DTBZ}" tar --create --bzip2 --file=${DTBZ} ${D} log "Created the tarfile: ${DTBZ}" log "Check the archive's fidelity." if tar --diff --bzip2 --file=${DTBZ} --directory=${RECDIR} ${D} then # Remove original files if archive is ok. rm -r ${RECDIR}/${D} log "Removed: ${RECDIR}/${D}" else # Remove the incorrect archive, keeping originals. log "Problem creating archive for ${RECDIR}/${D}" log "Consider removing bad archive ${DTBZ}" #rm ${DTBZ} fi done log "$0 finished running."