Fancy ways to use the Unix find command

-exec

The -exec directive is very powerful. It is a very fast and efficient way for performing some action on a very specific subset of files. The basic format is:

$ find . -name "*temp" -exec rm -v '{}' \;

This command removes all files in the current directory or its subdirectories that end with "temp". The {} is where each of the found files gets substituted into the exec clause’s command. The \; is where the exec clause command ends.

Avoiding -exec

Here’s an example of a way to send each result off to a Bash loop:

find . -type f | while read X; do wc -c $X; done | sort -n

This pipes the output of find (a bunch of file names that are found) to a Bash while loop which reads each into a variable X to do with what you want, in this case it does a word count by character and then sorts them numerically. This one way to find large files.

Here’s another way which uses xargs. This example comes from some Gentoo updating script.

find . -type d | xargs chmod 0755; find . -type f | xargs chmod 0644;

Dates

This post is a very good explanation of how to use find with dates.

Summary
  • +n - Match files older than n 24hr periods ago.

  • -n - Match files newer than n 24hr periods ago.

  • n - Match files from the 24 hour period exactly n 24hr periods ago.

Here’s a little script to rotate files (i.e. delete old ones):

#!/bin/bash
DAYS_TO_KEEP=7
find /data/surveillance -name '*.jpg' -and \
                        -ctime +${DAYS_TO_KEEP} \
                        -exec rm '{}' \;

Put that in a cron job every day to keep unwanted files from building up.

Just to be clear here’s some more examples.

$ date
Mon Mar 28 22:16:03 UTC 2016
$ find . -mtime +7 -and -type d -exec ls -d '{}' \; | sort | tail -n5
./20160315
./20160316
./20160317
./20160318
./20160319
$ find . -mtime 7 -and -type d -exec ls -d '{}' \;
./20160320
$ find . -mtime -7 -and -type d -exec ls -d '{}' \; | sort | head -n6
.
./20160321
./20160322
./20160323
./20160324
./20160325

If you want everything on the day exactly a week ago and everything before, just adjust the number down to +6. To get everything on the day exactly a week ago and all the stuff since adjust the number up to -8.

If you need to match files based on their age relative to something else in the file system, consider -newer, -anewer, and -cnewer. And for short term work, -[acm]min.

Intricate Find Logic

Often you need some weird conjunction of properties which must be true. The find command has a full set of logical operators. Here’s an example.

find ../chemicals -iname "*sdf" \
-and \( -iname "*bb*" -or -iname "*block*" \)

The same style can be used with -not or \! to invert expressions. If you want to be more cryptic, the -and options are always superfluous and can be omitted. Two terms together imply that the first and the second must be satisfied. However, in scripts and for clarity, why skimp on the options?

Finding Large Files

If you’re trying to find out where all your disk space went, the find command can help you find the biggest files.

$ find . -size +2000000c -exec du -b '{}' \; | sort -nr | head -n10

In this example, the 2000000c means 2 million bytes or 2MB. This command looks for all files that are bigger than 2MB and then executes the du command on them so that a number representing their size is returned. This is piped to sort which is numerically and reverse sorted. Then that list is piped to head so you only see the top 10 files. If you don’t have 10 files bigger than 2MB, you’ll see them all. This search starts in the current directory you’re in because of the . immediately after the find command. If you want to start your search at a different parent point, just put that path instead.

Finding Files With Duplicated Contents

Using the technique of piping to a while loop, the following will find duplicated files on a file system.

find /path/to/start/from -type f \
    | while read X; do head -c 100000 "$X" | echo -n `md5sum | cut -b-32`; echo " $X"; done \
    | sort -k1 \
    | awk 's==$1{print p;p=$0;print p}{s=$1;p=$0}' | uniq

This line goes through a file system and prints out a list of all the files which have duplicates. It actually only looks at the first 100kB which saves some time for big collections of mp3s or videos. Note however that I found many (different) mp3s in my collection were identical for the first 10kB or so. Strange, but true. (Ah, Andrey finds the reason!)

Changing UID/GID of files

How to use find/exec to change the group and owner of a lot of files: Look for all instances of a file in /home/xed which is owned by group ID 1004 and change it to group "newgroup".

$ sudo find /home/xed -gid 1004 -exec chgrp newgroup '{}' \;

Same with the owner (i.e. find files owned by 1001 and change to 20001):

$ sudo find /home/brian -uid 1001 -exec chown 20001 '{}' \;