Fancy ways to use the Unix find command

-exec

The -exec directive is very powerful. It is a very fast and efficient way for performing some action on a very specific subset of files. The basic format is:

find . -name "*temp" -exec rm -v '{}' \;

This command removes all files in the current directory or its subdirectories that end with "temp". The {} is where each of the found files gets substituted into the exec clause’s command. The \; is where the exec clause command ends.

In especially gruesome cases, you may have 631351 directories in /tmp. This can be tricky because getting rid of them can be hard because of the shell’s globbing limit — too many file names. The scary answer is to use pretty much the exact command above but with the rm taking the options -rv. Think very hard about what you’re doing if you need to do that!

Here’s something Mac people should have on a cron job every minute.

find / -iname ".DS_Store" -exec rm -v '{}' \;

Avoiding -exec

Here’s an example of a way to send each result off to a Bash loop:

find . -type f | while read X; do wc -c $X; done | sort -n

This pipes the output of find (a bunch of file names that are found) to a Bash while loop which reads each into a variable X to do with what you want, in this case it does a word count by character and then sorts them numerically. This one way to find large files.

Here’s another way which uses xargs. This example comes from some Gentoo updating script.

find . -type d | xargs chmod 0755; find . -type f | xargs chmod 0644;

Dates

This post is a very good explanation of how to use find with dates.

Summary
  • +n - Match files older than n 24hr periods ago.

  • -n - Match files newer than n 24hr periods ago.

  • n - Match files from the 24 hour period exactly n 24hr periods ago.

Here’s a little script to rotate files (i.e. delete old ones):

#!/bin/bash
DAYS_TO_KEEP=7
find /data/surveillance -name '*.jpg' -and \
                        -ctime +${DAYS_TO_KEEP} \
                        -exec rm '{}' \;

Put that in a cron job every day to keep unwanted files from building up.

Just to be clear here’s some more examples.

$ date
Mon Mar 28 22:16:03 UTC 2016
$ find . -mtime +7 -and -type d -exec ls -d '{}' \; | sort | tail -n5
./20160315
./20160316
./20160317
./20160318
./20160319
$ find . -mtime 7 -and -type d -exec ls -d '{}' \;
./20160320
$ find . -mtime -7 -and -type d -exec ls -d '{}' \; | sort | head -n6
.
./20160321
./20160322
./20160323
./20160324
./20160325

If you want everything on the day exactly a week ago and everything before, just adjust the number down to +6. To get everything on the day exactly a week ago and all the stuff since adjust the number up to -8.

If you need to match files based on their age relative to something else in the file system, consider -newer, -anewer, and -cnewer.

And for short term work, -[acm]min.

To match specific dates, i.e. not time ranges anchored around the current time, see if the find man page mentions -newerXY

Grab Today’s Photos From The Camera

This happens a lot where I take a bunch of pictures today for some project and I want all those photos in one directory. This grabs everything with am mtime of less than 1 and puts it somewhere else.

find . -mtime -1 -exec sh -c 'mv -v $(basename $0) /newpat/$(basename $0)' {} \;

A bit baroque but works fine.

Find Most Stale/Recent Old/New Directories/Files

Often I’m working on a complex project with millions of files and versions and I come back to it after a month and can’t remember which files were the most recently worked on. Another useful situation is if you have some user accounts and you want to know who has not touched the account for a long time.

N=500
for D in $(find . -maxdepth 1 -type d -mtime +$N); do \
test $(find $D -type f -mtime -$N -print -quit) || echo $D; done

Here’s the newest files in the directory tree.

find . -type f -printf '%T+ %p\n' | sort -r | head

Intricate Find Logic

Often you need some weird conjunction of properties which must be true. The find command has a full set of logical operators. Here’s an example.

find ../chemicals -iname "*sdf" \
-and \( -iname "*bb*" -or -iname "*block*" \)

The same style can be used with -not or \! to invert expressions. If you want to be more cryptic, the -and options are always superfluous and can be omitted. Two terms together imply that the first and the second must be satisfied. However, in scripts and for clarity, why skimp on the options?

Finding Large Or Small Files

If you’re trying to find out where all your disk space went, the find command can help you find the biggest files.

$ find . -size +2000000c -exec du -b '{}' \; | sort -nr | head -n10

In this example, the 2000000c means 2 million bytes or 2MB. This command looks for all files that are bigger than 2MB and then executes the du command on them so that a number representing their size is returned. This is piped to sort which is numerically and reverse sorted. Then that list is piped to head so you only see the top 10 files. If you don’t have 10 files bigger than 2MB, you’ll see them all. This search starts in the current directory you’re in because of the . immediately after the find command. If you want to start your search at a different parent point, just put that path instead.

To find small files, for example to weed out misfires that didn’t actually log what you wanted to record, you must be careful to not collect the directory which will be small in file size.

This will find all files less than 5MB but it will also find the current directory, . which is also small. If you pipe this to something you might get very wrong results.

find . -type f -size -5000000c

Use the -type specifier to make sure you’re getting what you need.

find . -type f -size -5000000c | while read N; do ls -lh ${N}; done

Finding Files With Duplicated Contents

Using the technique of piping to a while loop, the following will find duplicated files on a file system.

find /path/to/start/from -type f \
    | while read X; do head -c 100000 "$X" | echo -n `md5sum | cut -b-32`; echo " $X"; done \
    | sort -k1 \
    | awk 's==$1{print p;p=$0;print p}{s=$1;p=$0}' | uniq

This line goes through a file system and prints out a list of all the files which have duplicates. It actually only looks at the first 100kB which saves some time for big collections of mp3s or videos. Note however that I found many (different) mp3s in my collection were identical for the first 10kB or so. Strange, but true. (Ah, Andrey finds the reason!)

Changing UID/GID of files

How to use find/exec to change the group and owner of a lot of files: Look for all instances of a file in /home/xed which is owned by group ID 1004 and change it to group "newgroup".

$ sudo find /home/xed -gid 1004 -exec chgrp newgroup '{}' \;

Same with the owner (i.e. find files owned by 1001 and change to 20001):

$ sudo find /home/brian -uid 1001 -exec chown 20001 '{}' \;

Bring something in from a 1980s DOS filesystem (i.e. what all storage devices come with by default? You can change the horrendous 777 permissions with commands like these.

$ find mytopdir -type d -exec chmod -v 755 {} \;
$ find mytopdir -type f -exec chmod -v 644 {} \;
$ find mytopdir -exec chown -v xed:xed {} \;