2

I have a list of newline-separated strings. I need to iterate through each line, and use the argument surrounded with wildcards. The end result will append the found files to another text file. Here's some of what I've tried so far:

cat < ${INPUT} | while read -r line; do find ${SEARCH_DIR} -name $(eval *"$line"*); done >> ${OUTPUT}

I've tried many variations of eval/$() etc, but I haven't found a way to get both of the asterisks to remain. Mostly, I get things that resemble *$itemFromList, but it's missing the second asterisk, resulting in the file not being found. I think this may have something to do with bash expansion, but I haven't had any luck with the resources I've found so far.

Basically, need to supply the -name parameter with something that looks like *$itemFromList*, because the file has words both before and after the value I'm searching for.

Any ideas?

John Halbert
  • 23
  • 1
  • 2
  • `cat <"$INPUT" | ...` is serves no purpose whatsoever. Remove both the `cat` and the `|`, and make it `<"$INPUT" ...` (or, rather, for a `while` loop, put the redirection after the `done` at the end). – Charles Duffy Nov 11 '17 at 03:23
  • (in some cases, such as `sort`, using `cat` is not just unnecessary but much slower than providing a direct file handle on the input file: A FIFO, unlike a regular handle, isn't seekable; it can only be read once, front-to-back, so you can't let different threads process different parts of the input without copying it from the FIFO to a temporary file first). – Charles Duffy Nov 11 '17 at 03:30
  • Just to be clear -- when you say "the file has words both before and after the value I'm searching for", you mean the file *name* has words before and after, right? `find -name` doesn't search file *contents*, it only searches file *names*. – Charles Duffy Nov 11 '17 at 05:23

1 Answers1

2

Use double quotes to prevent the asterisk from being interpreted as an instruction to the shell rather than find.

-name "*$line*"

Thus:

while read -r line; do
  line=${line%$'\r'}  # strip trailing CRs if input file is in DOS format
  find "$SEARCH_DIR" -name "*$line*"
done <"$INPUT" >>"$OUTPUT"

...or, better:

#!/usr/bin/env bash

## use lower-case variable names
input=$1
output=$2

args=( -false )                 # for our future find command line, start with -false
while read -r line; do
  line=${line%$'\r'}            # strip trailing CR if present
  [[ $line ]] || continue       # skip empty lines
  args+=( -o -name "*$line*" )  # add an OR clause matching if this line's substring exists
done <"$input"

# since our last command is find, use "exec" to let it replace the shell in memory
exec find "$SEARCH_DIR" '(' "${args[@]}" ')' -print >"$output"

Note:

  • The shebang specifying bash ensures that extended syntax, such as arrays, are available.
  • See BashFAQ #50 for a discussion of why an array is the correct structure to use to collect a list of command-line arguments.
  • See the fourth paragraph of http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html for the relevant POSIX specification on environment and shell variable naming conventions: All-caps names are used for variables with meaning to the shell itself, or to POSIX-specified tools; lowercase names are reserved for application use. That script you're writing? For purposes of the spec, it's an application.
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Hey Charles, thanks for all the helpful information! In running your commands, it's only returning essentially an `ls` listing of all the files in the directory - not the actual found files. It's doing this for each of the lines inside the input file, as if the input to the `-name` argument was using an asterisk by itself. Any ideas what I'm doing wrong? – John Halbert Nov 11 '17 at 03:48
  • Additionally, I'm confused regarding the use of -o and -false in the second example you provided. Is it initialized in the array in order only use it once at the beginning of the command? If so, why wouldn't it be next to the call to `find` lower down in the script? Are these two arguments working together to enable some kind of loop? Does the second script do something different to make the asterisks unneeded? – John Halbert Nov 11 '17 at 04:51
  • We generate just one `find` command that looks like `find . '(' -false -o -name '*foo*' -o -name '*bar*' ')' -print`. That way we just scan the filesystem once and find files under any of your listed names. – Charles Duffy Nov 11 '17 at 04:53
  • As for tracking what's going on at runtime, the results of `bash -x scriptname` -- tracing execution -- would be helpful. – Charles Duffy Nov 11 '17 at 04:54
  • (On rereading, I note that I left out the asterisks in the second sample; oops!) – Charles Duffy Nov 11 '17 at 04:57
  • Yep, the output I was getting while trying to debug with `echo` statements replacing the `exec` was roughly `find . ( -o -name value1 -o -name value2 ) -print`. Originally it seemed a bit foreign, but after searching around, it seems like the -false and -o are working in conjunction for the filesystem scan you referring to - right? By the `ls` listing, it's listing literally _every_ file in that directory - not just matches. In fact, it's the exact output of just doing `find .` Essentially, I need to mimic `find . -name *12345*` – John Halbert Nov 11 '17 at 05:07
  • Are you still seeing that behavior with the edits? (BTW, `find . -name *12345*` is buggy -- the shell replaces `*12345*` with a list of files *before* `find` is started, so if you run that in a directory that has `12345.txt`, then it becomes `find . -name 12345.txt`; hence the need for quotes). – Charles Duffy Nov 11 '17 at 05:08
  • Yep, same behavior. I just realized I mentioned that words were surrounding the entries, but in reality they are files that follow the format of 200001015432112345.json - would that be causing any issues? – John Halbert Nov 11 '17 at 05:16
  • `set -x` logs, as previously requested, would be very helpful here; even if it's just the single line from the `find` command's invocation itself. (If you want to post more than fits in a comment, please try to find somewhere without ads; https://gist.github.com/ is good, so is http://ix.io/ or http://sprunge.us/). – Charles Duffy Nov 11 '17 at 05:17
  • Is this helpful? https://gist.github.com/xation/c590bec10f28679ad90dc34e6ce87076 Note: I replaced the actual filenames, but the first two the first and second reference were the same number. edit: oops, hang on, I attempted to put asterisks around the filenames earlier, and those weren't removed in this version. – John Halbert Nov 11 '17 at 05:34
  • Ahh! Very helpful. What that tells us that your input file is in DOS, not UNIX form. Use `dos2unix` to convert the CRLF newlines to LF. You can also open it in vim, run `:set fileformat=unix`, and save. – Charles Duffy Nov 11 '17 at 05:39
  • I also added an edit which will allow DOS-format input files to be handled correctly. – Charles Duffy Nov 11 '17 at 05:41
  • (The giveaway is the log being malformed in a way that indicates that the cursor is being sent back to the beginning of the line -- which is what a CR character does when printed). – Charles Duffy Nov 11 '17 at 05:43
  • How could you tell? Oh man, I can't believe that was the issue. Well, at least I learned a ton from it! Thank you so much for your help, Charles! – John Halbert Nov 11 '17 at 05:45