2

I want to know how I would go about looking for..let's say text files in a given directory. I want to iterate through all of the files in my directory, and for every text file, I want to convert it to a pdf file. Problem is, I do not know how to check whether a file is a text file within the parameters of an if statement in bash shell.

I set my ListOfFiles=`ls -l` and I iterate through with a for loop, i just need to know how to check for file types in an if statement.

Thank you in advance.

mklement0
  • 382,024
  • 64
  • 607
  • 775
user201535
  • 87
  • 1
  • 5
  • 15
  • 3
    See http://mywiki.wooledge.org/ParsingLs -- you shouldn't use `ls` in scripts; globbing is both more efficient (no programs external to the shell required, whereas `/bin/ls` is a separate executable) and more correct (see the link). – Charles Duffy Aug 03 '16 at 04:10
  • 3
    ...the answer by @mklement0 includes a branch with tradeoffs made for performance over correctness, but at least there it's an intentional choice and you're getting something in return. (Personally, I wouldn't make that choice -- creating a PDF is expensive enough that the cost of invoking `file` individually on candidates almost disappears as a percentage of overall cost unless you're doing something inefficient like running it across all files, as opposed to only new ones, on a regular/cronned basis). – Charles Duffy Aug 03 '16 at 04:14

1 Answers1

4

The following lists all text files in the current directory.

file --mime-type * -F$'\t' | awk -F'\t *' '$2 ~/^text\/plain/ { print $1 }'

Note: This assumes that your filenames have neither embedded tabs nor embedded newlines, which is not typically a problem.

  • file --mime-type * -F$'\t' determines the file type of each file in the current folder (*) and prints a two-column list: the filename at hand, followed by a tab (-F'$\t'), followed by spaces for alignment, followed by the file type expressed as a MIME type.

  • awk -F'\t *' '$2 ~/^text\/plain/ { print $1 }' then parses each line into the filename and MIME type (-F'\t *), tests if the MIME type (field 2,$2) starts with (^) string text/plain and, if so, prints the filename (field 1, $1).

To process the resulting files in a loop, use while:

while IFS= read -r textfile; do
  # Work with "$textfile"
done < <(file --mime-type * -F$'\t' | awk -F'\t *' '$2 ~/^text\/plain/ { print $1 }')

Note that while you could call file in a conditional inside a for file in * loop, the above approach is much more efficient.
For the record, here's how you would use the command in a conditional:

if [[ $(file -b --mime-type "$file") == 'text/plain'* ]]; then ...
mklement0
  • 382,024
  • 64
  • 607
  • 775