Finding human-readable files on Unix

Question

I'd like to find human-readable files on my Linux machine without a file extension constraint. Those files should be of human sensing files like text, configuration, HTML, source-code etc. files. Is there a way to filter and locate?

The `file` utility is pretty good at determining the type of content in a file. Perhaps you could use this and filter files based on its output. — cdhowie, Jan 24 '13 at 15:46
AFAIK only Windows trusts file extension. UNIX-like OSs use `file`. Anyway, you have to define "human readable". — m0skit0, Jan 24 '13 at 15:51
How precisely does this need to be? And are you looking for EVERY file in the system, or just in a selected part of the system? What if the system has umpteen terabytes of disks attached, is it acceptable to wait for several hours (because that's how long it takes to actually read all the files)? — Mats Petersson, Jan 24 '13 at 16:01
Also, would for example a PDF be considered human readable, or not? What about "postscript"? What about contents in a mail-folder? What about .zip, .tar, .gz, .bz, or .xz files? If those are just containers for text files, does that count? — Mats Petersson, Jan 24 '13 at 16:02
i will be searching in a directory with size of, let us a say, 5 GB. to define human-readable on examples; pdf, tar.gz, an thunderbird mail file, open office files etc are not-readable. we should read files by more utility or vi. — Yiğit, Jan 24 '13 at 16:08

score 25 · Answer 1 · edited Aug 05 '21 at 00:49

25

Use:

find /dir/to/search -type f | xargs file | grep text

find will give you a list of files.

xargs file will run the file command on each of the lines from the piped input.

edited Aug 05 '21 at 00:49

Peter Mortensen

30,738
21
105
131

answered Feb 03 '16 at 15:47

Ben Lamm

603
8
18

2

Works perfect! Nice solution. – fuuman Apr 19 '16 at 14:59
1

And for files with *funny* names: `find /dir/to/search -type f -print0 | xargs -0 file | grep text` ... **funny**? Embedded spaces, parenthesis, brackets, braces, ... – tink Aug 05 '21 at 01:20

score 8 · Accepted Answer · edited Aug 05 '21 at 00:49

find and file are your friends here:

find /dir/to/search -type f -exec sh -c 'file -b {} | grep text &>/dev/null' \; -print

This will find any files (NOTE: it will not find symlinks directories sockets, etc., only regular files) in /dir/to/search and run sh -c 'file -b {} | grep text &>/dev/null' ; which looks at the type of file and looks for text in the description. If this returns true (i.e., text is in the line) then it prints the filename.

NOTE: using the -b flag to file means that the filename is not printed and therefore cannot create any issues with the grep. E.g., without the -b flag the binary file gettext would erroneously be detected as a textfile.

For example,

root@osdevel-pete# find /bin -exec sh -c 'file -b {} |  grep text &>/dev/null' \; -print
/bin/gunzip
/bin/svnshell.sh
/bin/unicode_stop
/bin/unicode_start
/bin/zcat
/bin/redhat_lsb_init
root@osdevel-pete# find /bin -type f -name *text*
/bin/gettext

If you want to look in compressed files use the --uncompress flag to file. For more information and flags to file see man file.

I am new to the unix-like ecosystem. Why are you using "&" at the end of your `grep`? My understanding is that this will make grep run asynchronously. Will this still give the exit status to `find`? Why would one do that? Thank you for taking the time to answer. — Jesse Emond, May 27 '14 at 04:14
@JesseEmond: The command doesn't actually contain a `&` token which would put the job in the background, it contains a `&>` token which causes redirection of both stdout and stderr. — Ben Voigt, Apr 05 '21 at 16:32

score 0 · Answer 3 · edited Aug 05 '21 at 01:21

0

This should work fine, too:

file_info=`file "$file_name"` # First reading the file info string which should have the words "ASCII" or "Unicode" if it's a readable file

if grep -q -i -e "ASCII" -e "Unicode"<<< "$file_info"; then
    echo "file is readable"
fi

edited Aug 05 '21 at 01:21

tink

14,342
4
46
50

answered Mar 07 '20 at 09:37

because_im_batman

975
10
26

Finding human-readable files on Unix

3 Answers3

Linked