0

Consider me frustrated... I've spent the past 2 hours trying to figure out how to have a command that has pipes in it pump that output to a for loop. Quick story on what I'm attempting followed by my code.

I have been using xbmc for years. However, shortly after I started, I had exported my library, which turns out to be more of a hassle than it's worth (especially with me now going through with a set naming scheme of folders and files contained in them). I am wanting to remove all of the files that xbmc added, so I figured I'd write a script that would remove all the necessary files. However, that's where I ran into a problem.

I am trying to use the locate command (because of its speed), followed by a grep (to get rid of all the filesystem .tbn) and an egrep (to remove the .actors folder xbmc creates from the results), followed by a sort (although, the sort isn't necessary, I added it during debugging so the output while testing was nicer). The problem is only the first file is processed and then nothing. I read a lot online and figured out that bash creates a new subshell for every pipe and by the time it has finished the loop once, the variable is now dead. So I did more digging on how to get around that, and everything seemed to show how I can work around it for while loops, but nothing for for loops.

While I like to think I'm competent at scripting, I always have things like this come up that proves that I'm still just learning the basics. Any help from people smarter than me would be greatly appreciated.

#!/bin/bash

for i in "$(locate tbn | grep Movies | egrep -v .actors | sort -t/ +4)"
do
  DIR=$(echo $i | awk -F'/' '{print "/" $2 "/" $3 "/" $4 "/" $5 "/"}')
  rm -r "$DIR*.tbn" "$DIR*.nfo" "$DIR*.jpg" "$DIR*.txt" "$DIR.actors"
done

After reading through the response below, I'm thinking the better route to accomplish what I want is as follows. I'd love any advice to the new script. Rather than just copying and pasting @Charles Duffy's script, I want to find the right/best way to do this as a learning experience since there is always a better and best way to code something.

#!/bin/bash

for i in "*.tbn" "*.nfo" "*.jpg" "*.txt" "*.rar" #(any other desired extensions)
do
  find /share/movies -name "$i" -not -path "/share/movies/.actors" -delete
done

I have the -not -path portion in there first to remove the .actors folder that xbmc puts at the root of the source directory (in this case, /share/movies) from the output so no thumbnails (.tbn files) get removed from there, but I want them removed from any other directories contained within /share/movies (and I would like to remove the thumbnails from within the .actors folder if it is contained inside a specific movie folder). The -delete option is because it was suggested in a gnu.org page that -delete is better than calling /bin/rm due to not needing to fork for the rm process, which keeps things more efficient and prevents overhead.

I'm pretty sure I want the items in the for line to be quoted so it is a literal *.tbn that is used within the find command. To give you an idea of the directory structure, it's pretty simple. I want to remove any of the *.tbn *.jpg and *.nfo files within those directories.

/share/movies/movie 1/movie 1.mkv  
/share/movies/movie 1/movie 1.tbn  
/share/movies/movie 1/movie 1.jpg  
/share/movies/movie 1/movie 1.nfo  

/share/movies/movie 2/movie 2.mp4  
/share/movies/movie 2/movie 2.srt  
/share/movies/movie 2/movie 2 (subs).rar  

/share/movies/movie 3/movie 3.avi  
/share/movies/movie 3/movie 3.tbn  
/share/movies/movie 3/movie 3.jpg  
/share/movies/movie 3/movie 3.nfo  
/share/movies/movie 3/.actors/actor 1.tbn  
/share/movies/movie 3/.actors/actor 2.tbn  
/share/movies/movie 3/.actors/actor 3.tbn  
bassmadrigal
  • 121
  • 8
  • 1
    See BashPitfalls entry #1: http://mywiki.wooledge.org/BashPitfalls – Charles Duffy Sep 26 '14 at 23:46
  • 1
    I wouldn't worry about speed; "First comes right, then comes fast", and besides you plan to do this only once. I'd use `find` or maybe `xargs`. – Beta Sep 26 '14 at 23:50
  • BTW, what are you trying to do with the `awk` command? I'm not sure why you couldn't achieve the same effect with bash's built-in parameter expansion. – Charles Duffy Sep 26 '14 at 23:51
  • ...if you want to trim the leading directory element from the name, for instance: `dir=/${i#*/}` – Charles Duffy Sep 26 '14 at 23:52
  • Wow... reading through that BashPitfalls page, I realize I am guilty with a lot of them, but I never had actual programming instruction and just taught myself with the help of google over the years. Thanks! With the awk command, I was trying to remove the actual filename (the last variable which would've been $6) and keep the rest of the directory structure intact so I could then use the directory as the base for the rm command. As for your last comment, what does that do? Without the name of what it does, Google hasn't been a help... I'd like to learn alternatives since awk is power hungry. – bassmadrigal Sep 29 '14 at 03:02
  • @bassmadrigal, okay, if what you want to do is to remove the _last_ directory element, that's `dir=${i%/*}`. See BashFAQ #73: http://mywiki.wooledge.org/BashFAQ/073 – Charles Duffy Sep 29 '14 at 04:07
  • You're right -- if you want to pass the wildcards to find, you do indeed want to quote them. Personally, I'd be more explicit about my grouping operators. – Charles Duffy Sep 29 '14 at 16:47

3 Answers3

2

This is just a quoting problem. "$(locate tbn | ...)" is a single word because the quotes prevent word splitting. If you leave out the quotes, it becomes multiple words, but then spaces in the filepaths will become problems.

Personally, I'd use find with an -exec clause; it might be slower that locate (locate uses a periodically update database so it trades off accuracy for speed), but it will avoid this sort of quoting problem.

rici
  • 234,347
  • 28
  • 237
  • 341
  • 1
    It's not just a quoting problem there -- there are also quoted wildcards which the user expects to expand, and an expansion in front of `echo` which should be quoted but isn't. – Charles Duffy Sep 26 '14 at 23:50
  • 1
    ...and, really, just removing the quotes isn't enough inasmuch as string-splitting filenames with spaces isn't desired behavior. – Charles Duffy Sep 26 '14 at 23:58
  • 1
    @CharlesDuffy: I said that. Didn't you read the third sentence? – rici Sep 27 '14 at 01:24
  • 1
    Ahh. Yup, you did. Still leaves glob expansion uncovered. – Charles Duffy Sep 27 '14 at 01:51
  • I'm not seeing executed issues with my unquoted $i in setting the DIR variable. It actually displays the desired output when I echo back $DIR after the command is ran. Is that just a lucky output in my case, but generally trouble-causing? – bassmadrigal Sep 29 '14 at 03:14
  • As for the find -exec command, after reading up more on it, are you suggesting I just remove the script completely, or change it to a for loop with the extensions I want to remove as $i, then have the find and exec used afterwards to remove those files directly within the find command? `find . -name *"$i" -exec /bin/rm {};` or actually delete directly `find . -name *"$i" -delete` ($i would contain the extension I want to remove that would be set in the for loop, and I'd need the wildcard unquoted and the variable quoted, right?) – bassmadrigal Sep 29 '14 at 03:20
  • 1
    @bassmadrigal: It's not the fact that `$i` is unquoted, although that's a problem: it's the fact that "$(...)" *is* quoted, which means that `i` will only take on a single value, which is the entire list. Since you don't quote `$i`, the list is provided to `awk` as a single line, but it only cares about the first few fields, so it will ignore most of that line. – rici Sep 29 '14 at 03:37
  • @bassmadrigal: with respect to the find command, I'd start with `find Music` (or something like that) so that you're searching in the right directory. Beyond that, I don't know exactly what you're trying to do, but you normally quote `"*"` in a `find -name` option, because the idea is for `find` to interpret the pattern, not the shell. If you want to delete all files with a .nfo extension, you could do it by searching for `-iname "*.nfo"`, but if you want to delete `foo.nfo` from a directory which contains `foo.tbn` you'd need something a bit more sophisticated. – rici Sep 29 '14 at 03:41
  • Oh, I guess I never realized what a for loop actually did. I thought that it would pass off the items in the for line one at a time to the loop itself, so the first time it would pass the first directory, the next time it would pass the second and so on. Apparently, I still have a lot to learn... – bassmadrigal Sep 29 '14 at 03:43
  • As to your second reply, I'm editing my first post to add a new script based on the find command. However, I'm hoping I can use the for loop so I don't have to keep changing the contents of the find command. So I'm trying to utilize what @CharlesDuffy mentioned in his first reply to not have * quoted. Once I edit my first post, see what you suggest from there. From my testing, it seems to be working as I posted it (but that seems to not necessarily mean it is the right way to do things). – bassmadrigal Sep 29 '14 at 03:48
  • 1
    @bassmadrigal: That's exactly what a for loop does. But what is an item? A quoted expression (unless it has an @ in it) is a single item, because it is quoted. So there is a difference between `for a in $(seq 10); do echo loop; echo $a; done` and `for a in "$(seq 10)"; do echo loop; echo $a; done` which you can easily see if you type those two commands. – rici Sep 29 '14 at 04:00
  • 1
    @bassmadrigal, btw, with respect to the unquoted `echo $i` not appearing to cause problems -- try a directory name with two spaces next to each other, or one that contains asterisks surrounded by spaces on both sides. It may work in most cases, but it certainly doesn't work in all. – Charles Duffy Sep 29 '14 at 04:05
  • @rici, I see. So if the items in the for loop are quoted, it suddenly becomes one item with newlines at the end of each entry. As long as they remain unquoted, then each is individually passed down. – bassmadrigal Sep 29 '14 at 04:09
  • @CharlesDuffy, that makes sense. I'm pretty strict in my naming standards for the movies, but I can see how extra things like that could cause a problem. I think my revised script in the top post (once posted) will be much better thanks to all the suggestions I received here). – bassmadrigal Sep 29 '14 at 04:10
  • 1
    @bassmadrigal: If unquoted, the items are separated by *whitespace* (such as spaces) and not just by newlines. So it is probably the case that you don't want either the quoted or the unquoted version. Which was my point from the beginning. – rici Sep 29 '14 at 04:16
  • @rici, ok, that makes sense why I should be steered away from using that. I can see how it can lead into a multitude of problems. I have since updated my initial post with a heavily modified script. I'm thinking that due to issues previously mentioned here, I should go with my second (technically third) for loop, except I'm thinking I should probably put quotes around each item in the for loop `for i in "*.tbn" "*.jpg" #etc` to prevent having the * unquoted in the find command and I'd assume it's just good practice to include the items in the for loop in quotes. Am I horribly off? – bassmadrigal Sep 29 '14 at 06:08
  • @bassmadrigal, `for i in "*.tbn" "*.jpg"` puts the literal string `*.tbn` in `i`, not files matching `*.tbn`. That's probably not what you want. `for i in *.tbn *.jpg`, by contrast, iterates over _files matching_ those globs, not the globs themselves. – Charles Duffy Sep 29 '14 at 12:32
  • @bassmadrigal, ...now, if you iterate over the glob expressions themselves, then use them unquoted later, then they would be expanded into actual filenames _on those unquoted uses_. Which may or may not be what you want, depending on the details of how your script is written. – Charles Duffy Sep 29 '14 at 12:34
  • @CharlesDuffy, ok, the first comment makes sense. Putting them in quotes makes them a literal string to match. Also, it seems that, after reading more of that in-depth bash guide, word splitting is done before using the wildcard, so *.tbn and *.jpg should suffice. However, I'm confused by what you mean on your last comment. Are you saying if I leave it unquoted within the for loop, it would show `some file.jpg` as two files? `some` and `file.jpg`? This is just to make sure I quote any uses within the actual loop, correct? – bassmadrigal Sep 29 '14 at 13:42
  • Actually, looking back at my script, isn't the literal string of *.tbn what I want in the find command? I'm just using the for loop to pass the extensions I want to find using the `find` command within the loop. I'm not wanting actual filenames passed with the for loop (at least with how it is written above), because I want the actual find command to be `find /share/movies -name "*.tbn" -not -path "/share/movies/.actors"` I've re-updated the second half of my question again with a (hopefully) better explanation of what I want to do and I think a proper script to accomplish it. – bassmadrigal Sep 29 '14 at 14:15
  • 1
    @bassmadrigal, unquoted `*.jpg` yields one string, `some file.jpg`, because glob expansion Does The Right Thing. On the other hand, if you take that name, put it in a variable, and expand that variable unquoted _again_, you get `some` and `file.jpg`. So, you want to avoid quotes on glob characters _at the time when the glob should be expanded_, but only at that time. – Charles Duffy Sep 29 '14 at 15:24
  • Ok, that makes perfect sense. Thanks! – bassmadrigal Sep 29 '14 at 15:33
2

Reading filenames from locate in a script is bad news in general unless your locate command has an option to NUL-delimit names (since every character other than NUL or / is valid in a filename, newlines are actually valid within filenames, making locate's output ambiguous). That said:

#!/bin/bash
# ^^ -- not /bin/sh, since we're using bash-only features here!

while read -u 3 -r i; do
  dir=${i%/*}
  rm -r "$dir/"*".tbn" "$dir/"*".nfo" "$dir/"*".jpg" "$dir/"*".txt" "$dir/.actors"
done 3< <(locate tbn | grep Movies | egrep -v .actors)

Notice how the *s cannot be inside of the double-quotes if you want them to be expanded, even though the directory names must be inside of double quotes to work if they have whitespace &c. in their names.


In general, I agree with @rici -- using find is by far the more robust approach, especially used with the GNU extension -execdir to prevent race conditions from being used to cause your command to behave in undesirable ways. (Think of a malicious user replacing a directory with a symlink to somewhere else while your script is running).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Again, thanks for the robust information. I am always amazed when I realize how little I know about shell scripting, even though I've been able to pull off impressive (to me) things. Is there a reason to go with a while loop over a for loop? From my early learning days, I had learned that the for loop is a great way to go through a list of things, including files. Is the while loop better just because I can do the command looping to prevent the subshell issue from surfacing? Or is there an even deeper reason to go with a while loop over a for loop? – bassmadrigal Sep 29 '14 at 03:11
  • @bassmadrigal, `while read -r` reads in content from the `locate` stream a line at a time, and is able to start as soon as `locate` begins to emit output (though using `sort` forces things to wait for the end, as `for` would, and so loses this efficiency). Much more important is that `while read` does no word-splitting and no glob expansion, so names containing whitespace or glob characters are handled safely. – Charles Duffy Sep 29 '14 at 04:01
  • @bassmadrigal, ...`for` is safe if you're iterating over contents of an array, or results of a glob expression. It is *not* safe if you're string-splitting command substitution output, absent great care (turning off glob expansion and setting `IFS` to contain only delimiter characters). – Charles Duffy Sep 29 '14 at 04:02
  • @bassmadrigal, ...see also BashFAQ #1, discussing best practices for reading input line-by-line: http://mywiki.wooledge.org/BashFAQ/001 – Charles Duffy Sep 29 '14 at 04:03
  • I definitely need to do more digging through there. I'll have to check it out tomorrow so I can get better with this. – bassmadrigal Sep 29 '14 at 04:11
  • 1
    Since you've described what you're accomplishing with awk, I've removed it from my answer here in favor of an equivalent parameter expansion. – Charles Duffy Sep 29 '14 at 15:27
  • 1
    @bassmadrigal, ...also, I took out the `sort`, since it stops the rest of your script from progressing until after `locate` has finished. That's not a huge deal for `locate` since it's fast, but _would_ be a big deal with `find`. Though if you were going to use `find`, using `-execdir` and writing your script to be run within the target directories would make more sense. – Charles Duffy Sep 29 '14 at 15:29
  • Fair enough. Any thoughts on my updated script (the second one) in the original question? Is this the best way to go about doing this? I was trying to steer away from this answer since it seems that locate with pipes to greps to gain my desired list doesn't seem to be the best route to go about this anymore (although, now I do understand what it's accomplishing and how). I'm just looking for the best way to accomplish this, and it seems that is to leave locate out of it. And when I read up on `-delete` within find, it seems to have the same benefit as `-execdir` to prevent race conditions. – bassmadrigal Sep 29 '14 at 15:44
  • The new script in the question works, but it's very redundant -- you could check all the patterns (and much more quickly) with just one find call. – Charles Duffy Sep 29 '14 at 16:48
1

Your second script, edited into the question, is an improvement. However, there's still room to do better:

#!/bin/bash

exts=( tbn nfo jpg txt rar )

find_args=( )    
for ext in "${exts[@]}"; do
  find_args+=( -name "*.$ext" -o )
done

find /share/movies -name .actors -prune -o \
 '(' "${find_args[@]:0:${#find_args[@]} - 1}" ')' -delete

This will build a command like:

find /share/movies -name .actors -prune -o \
  '('    -name '*.tbn' -o -name '*.nfo' -o -name '*.jpg' \
      -o -name '*.txt' -o -name '*.rar' ')' -delete

...and thus process all the extension in a single pass.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Ok, I think I understand what is going on here. You set the extensions I want in an array called `exts` (they are all separate elements since the whole thing isn't in quotes, right?). Then you create a blank array of find_args. Now I'm a little fuzzy. I believe the += is appending the `-name "*.$ext" -o` (with the variable expanded) to the end of find_args as it's going through the for loop. Is this correct? And it's adding three separate elements each time (since there's a blank space and no quotes encompassing everything)? – bassmadrigal Sep 29 '14 at 18:26
  • I'm assuming the array portion in the find command gets the first index and then goes and gets all but the last index (which would leave out the `-o` which would cause problems since there's nothing afterward for it to compare against), but I have no idea how it goes about doing that. I did some searching in the bash guide you linked, but I'm not fully grasping it. I'm thinking it gets the first index, with an offset of 0, and then gets the last index and subtracts 1, but I'm thinking the `#find_args` denotes some sort of pattern matching, but I don't know what. – bassmadrigal Sep 29 '14 at 18:30
  • Now, whether or not my last comment about the evaluation of using the array in the find command is correct or not, this begs the question. At what point is it not worth the complexity to do something more efficiently vs easy to read and running the find command multiple times? Maybe if you're scanning a large amount of data that can take a long period of time? The reason I ask is because I was initially stumped on what your code did until I spent about an hour dissecting it to hope I'm making sense of it. However, I don't think I could write something like this with my level of knowledge. – bassmadrigal Sep 29 '14 at 19:01
  • 1
    @bassmadrigal, it's not a pattern match; `${#array[@]}` evaluates to the number of elements in the array, just as `${#string}` evaluates to the number of characters in a string. – Charles Duffy Sep 29 '14 at 19:32
  • 1
    @bassmadrigal, ...and exactly -- for a lot of data, running `find` multiple times can take a while (especially if the directories are larger than will fit in your operating system's cache, or if there's memory pressure reducing the size of that cache). – Charles Duffy Sep 29 '14 at 19:32
  • @bassmadrigal, btw, since your edit shows examples of the actual directory structure, I've modified the `find` command to be more efficient (using `-prune`, which stops it from recursing down `.actors` in the first place, rather than using the `-not -path` to exclude paths there after recursing into them). – Charles Duffy Sep 29 '14 at 19:47
  • 1
    @bassmadrigal: The more you do this sort of thing, the easier it gets. Certainly, for a one-time command, it's only worth working this hard at it if you want to learn more about the tool (unless the command will be deployed against a server farm). But the advantage of having done it is that the next time you have a similar problem, the apparently complicated solution will spring to your fingers. There are common patterns and idioms, and over time these become part of your personal toolbox. – rici Sep 29 '14 at 20:22
  • @CharlesDuffy: Another solution, if you have `gnu find` and no extensions contain whitespace characters, is: `exts="tbn nfo jpg txt rar"; find ... -regex '.*\.'"${exts// /\\|}"`. I don't know if that is simpler or not :) And, of course, consistent with the original: `find ... -name "*.tbn" -execdir rm "*.tbn" "*.nfo" "*.jpg" "*.txt" "*.rar" \;` -- although that's a bit odd in directories with more than one .tbn file but perhaps there is a solution for that. – rici Sep 29 '14 at 21:05
  • @rici, does that last `-execdir` work? I'd think it would need a shell to expand the wildcards, so you'd need something more like `-execdir sh -c 'rm *.tbn *.nfo *.jpg *.txt *.rar'`. – Charles Duffy Sep 29 '14 at 21:24
  • 1
    @charles: good point; -execdir doesn't interpose a shell. With a shell, one can do better: `-execdir bash -c 'root=${1##*/};root=${root%.tbn}; rm "$root.tbn" "$root.nfo" "$root.jpg" "$root.txt" "$root.rar"' _ {} \;`. Or the way you did it in that comment :) – rici Sep 29 '14 at 21:32