Fancy ways to use the Unix find
command
-exec
The -exec directive is very powerful. It is a very fast and efficient way for performing some action on a very specific subset of files. The basic format is:
find . -name "*temp" -exec rm -v '{}' \;
This command removes all files in the current directory or its subdirectories that end with "temp". The {} is where each of the found files gets substituted into the exec clause’s command. The \; is where the exec clause command ends.
In especially gruesome cases, you may have 631351 directories in
/tmp
. This can be tricky because getting rid of them can be hard
because of the shell’s globbing limit — too many file names. The
scary answer is to use pretty much the exact command above but with
the rm
taking the options -rv
. Think very hard about what you’re
doing if you need to do that!
Here’s something Mac people should have on a cron job every minute.
find / -iname ".DS_Store" -exec rm -v '{}' \;
Avoiding -exec
Here’s an example of a way to send each result off to a Bash loop:
find . -type f | while read X; do wc -c $X; done | sort -n
This pipes the output of find
(a bunch of file names that are found)
to a Bash while
loop which reads each into a variable X
to do with
what you want, in this case it does a word count by character and then
sorts them numerically. This one way to find large files.
Here’s another way which uses xargs
. This example comes from
some Gentoo updating script.
find . -type d | xargs chmod 0755; find . -type f | xargs chmod 0644;
Dates
This post is a very good explanation of how to use find with dates.
-
+n - Match files older than n 24hr periods ago.
-
-n - Match files newer than n 24hr periods ago.
-
n - Match files from the 24 hour period exactly n 24hr periods ago.
Here’s a little script to rotate files (i.e. delete old ones):
#!/bin/bash DAYS_TO_KEEP=7 find /data/surveillance -name '*.jpg' -and \ -ctime +${DAYS_TO_KEEP} \ -exec rm '{}' \;
Put that in a cron job every day to keep unwanted files from building up.
Just to be clear here’s some more examples.
$ date
Mon Mar 28 22:16:03 UTC 2016
$ find . -mtime +7 -and -type d -exec ls -d '{}' \; | sort | tail -n5
./20160315
./20160316
./20160317
./20160318
./20160319
$ find . -mtime 7 -and -type d -exec ls -d '{}' \;
./20160320
$ find . -mtime -7 -and -type d -exec ls -d '{}' \; | sort | head -n6
.
./20160321
./20160322
./20160323
./20160324
./20160325
If you want everything on the day exactly a week ago and everything before, just adjust the number down to +6. To get everything on the day exactly a week ago and all the stuff since adjust the number up to -8.
If you need to match files based on their age relative to something
else in the file system, consider -newer
, -anewer
, and -cnewer
.
And for short term work, -[acm]min
.
To match specific dates, i.e. not time ranges anchored around the
current time, see if the find
man page mentions -newerXY
Grab Today’s Photos From The Camera
This happens a lot where I take a bunch of pictures today for some project and I want all those photos in one directory. This grabs everything with am mtime of less than 1 and puts it somewhere else.
find . -mtime -1 -exec sh -c 'mv -v $(basename $0) /newpat/$(basename $0)' {} \;
A bit baroque but works fine.
Find Most Stale/Recent Old/New Directories/Files
Often I’m working on a complex project with millions of files and versions and I come back to it after a month and can’t remember which files were the most recently worked on. Another useful situation is if you have some user accounts and you want to know who has not touched the account for a long time.
N=500
for D in $(find . -maxdepth 1 -type d -mtime +$N); do \
test $(find $D -type f -mtime -$N -print -quit) || echo $D; done
Here’s the newest files in the directory tree.
find . -type f -printf '%T+ %p\n' | sort -r | head
Intricate Find Logic
Often you need some weird conjunction of properties which must be
true. The find
command has a full set of logical operators. Here’s
an example.
find ../chemicals -iname "*sdf" \
-and \( -iname "*bb*" -or -iname "*block*" \)
The same style can be used with -not
or \!
to invert expressions.
If you want to be more cryptic, the -and
options are always
superfluous and can be omitted. Two terms together imply that the
first and the second must be satisfied. However, in scripts and for
clarity, why skimp on the options?
Finding Large Or Small Files
If you’re trying to find out where all your disk space went, the find command can help you find the biggest files.
$ find . -size +2000000c -exec du -b '{}' \; | sort -nr | head -n10
In this example, the 2000000c
means 2 million bytes or 2MB. This
command looks for all files that are bigger than 2MB and then executes
the du command on them so that a number representing their size is
returned. This is piped to sort
which is numerically and reverse
sorted. Then that list is piped to head
so you only see the top 10
files. If you don’t have 10 files bigger than 2MB, you’ll see them all.
This search starts in the current directory you’re in because of the
.
immediately after the find command. If you want to start your
search at a different parent point, just put that path instead.
To find small files, for example to weed out misfires that didn’t actually log what you wanted to record, you must be careful to not collect the directory which will be small in file size.
This will find all files less than 5MB but it will also find the
current directory, .
which is also small. If you pipe this to
something you might get very wrong results.
find . -type f -size -5000000c
Use the -type
specifier to make sure you’re getting what you need.
find . -type f -size -5000000c | while read N; do ls -lh ${N}; done
Finding Files With Duplicated Contents
Using the technique of piping to a while loop, the following will find duplicated files on a file system.
find /path/to/start/from -type f \
| while read X; do head -c 100000 "$X" | echo -n `md5sum | cut -b-32`; echo " $X"; done \
| sort -k1 \
| awk 's==$1{print p;p=$0;print p}{s=$1;p=$0}' | uniq
This line goes through a file system and prints out a list of all the files which have duplicates. It actually only looks at the first 100kB which saves some time for big collections of mp3s or videos. Note however that I found many (different) mp3s in my collection were identical for the first 10kB or so. Strange, but true. (Ah, Andrey finds the reason!)
Changing UID/GID of files
How to use find/exec to change the group and owner of a lot of files:
Look for all instances of a file in /home/xed
which is owned by group
ID 1004 and change it to group "newgroup".
$ sudo find /home/xed -gid 1004 -exec chgrp newgroup '{}' \;
Same with the owner (i.e. find files owned by 1001 and change to 20001):
$ sudo find /home/brian -uid 1001 -exec chown 20001 '{}' \;
Bring something in from a 1980s DOS filesystem (i.e. what all storage devices come with by default? You can change the horrendous 777 permissions with commands like these.
$ find mytopdir -type d -exec chmod -v 755 {} \;
$ find mytopdir -type f -exec chmod -v 644 {} \;
$ find mytopdir -exec chown -v xed:xed {} \;