I'd like to find human-readable files on my Linux machine without a file extension constraint. Those files should be of human sensing files like text, configuration, HTML, source-code etc. files. Is there a way to filter and locate?
-
2The `file` utility is pretty good at determining the type of content in a file. Perhaps you could use this and filter files based on its output. – cdhowie Jan 24 '13 at 15:46
-
AFAIK only Windows trusts file extension. UNIX-like OSs use `file`. Anyway, you have to define "human readable". – m0skit0 Jan 24 '13 at 15:51
-
How precisely does this need to be? And are you looking for EVERY file in the system, or just in a selected part of the system? What if the system has umpteen terabytes of disks attached, is it acceptable to wait for several hours (because that's how long it takes to actually read all the files)? – Mats Petersson Jan 24 '13 at 16:01
-
Also, would for example a PDF be considered human readable, or not? What about "postscript"? What about contents in a mail-folder? What about .zip, .tar, .gz, .bz, or .xz files? If those are just containers for text files, does that count? – Mats Petersson Jan 24 '13 at 16:02
-
i will be searching in a directory with size of, let us a say, 5 GB. to define human-readable on examples; pdf, tar.gz, an thunderbird mail file, open office files etc are not-readable. we should read files by more utility or vi. – Yiğit Jan 24 '13 at 16:08
3 Answers
Use:
find /dir/to/search -type f | xargs file | grep text
find
will give you a list of files.
xargs file
will run the file
command on each of the lines from the piped input.

- 30,738
- 21
- 105
- 131

- 603
- 8
- 18
-
2
-
1And for files with *funny* names: `find /dir/to/search -type f -print0 | xargs -0 file | grep text` ... **funny**? Embedded spaces, parenthesis, brackets, braces, ... – tink Aug 05 '21 at 01:20
find and file are your friends here:
find /dir/to/search -type f -exec sh -c 'file -b {} | grep text &>/dev/null' \; -print
This will find any files (NOTE: it will not find symlinks directories sockets, etc., only regular files) in /dir/to/search and run sh -c 'file -b {} | grep text &>/dev/null' ; which looks at the type of file and looks for text in the description. If this returns true (i.e., text is in the line) then it prints the filename.
NOTE: using the -b flag to file means that the filename is not printed and therefore cannot create any issues with the grep. E.g., without the -b
flag the binary file gettext would erroneously be detected as a textfile.
For example,
root@osdevel-pete# find /bin -exec sh -c 'file -b {} | grep text &>/dev/null' \; -print
/bin/gunzip
/bin/svnshell.sh
/bin/unicode_stop
/bin/unicode_start
/bin/zcat
/bin/redhat_lsb_init
root@osdevel-pete# find /bin -type f -name *text*
/bin/gettext
If you want to look in compressed files use the --uncompress
flag to file. For more information and flags to file see man file.

- 30,738
- 21
- 105
- 131

- 3,447
- 1
- 13
- 15
-
I am new to the unix-like ecosystem. Why are you using "&" at the end of your `grep`? My understanding is that this will make grep run asynchronously. Will this still give the exit status to `find`? Why would one do that? Thank you for taking the time to answer. – Jesse Emond May 27 '14 at 04:14
-
@JesseEmond: The command doesn't actually contain a `&` token which would put the job in the background, it contains a `&>` token which causes redirection of both stdout and stderr. – Ben Voigt Apr 05 '21 at 16:32
This should work fine, too:
file_info=`file "$file_name"` # First reading the file info string which should have the words "ASCII" or "Unicode" if it's a readable file
if grep -q -i -e "ASCII" -e "Unicode"<<< "$file_info"; then
echo "file is readable"
fi

- 14,342
- 4
- 46
- 50

- 975
- 10
- 26