1

I have a list of files:

file_name_FOO31101.txt
file_name_FOO31102.txt
file_name_FOO31103.txt
file_name_FOO31104.txt

And I want to use pairs of files for input into a downstream program such as:

program_call file_name_01.txt file_name_02.txt
program_call file_name_03.txt file_name_04.txt
...

I do not want:

program_call file_name_02.txt file_name_03.txt

I need to do this in a loop as follows:

#!/bin/bash

FILES=path/to/files

for file in $FILES/*.txt;

do

    stem=$( basename "${file}" ) # stem : file_name_FOO31104_info.txt
    output_base=$( echo $stem | cut -d'_' -f 1,2,3 )  # output_base : FOO31104_info.txt
    id=$( echo $stem | cut -d'_' -f 3 ) # get the first field : FOO31104
    number=$( echo -n $id | tail -c 2 ) # get the last two digits : 04

     echo $id $((id+1))

done

But this does not produce what I want.

In each loop I want to call a program once, with two files as input (last 2 digits of first file always odd 01, last 2 digits of second file always even 02)

fugu
  • 6,417
  • 5
  • 40
  • 75
  • 2
    As an aside, all-caps variable names are [specified by POSIX](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html) to be used by variables with meaning to the system or shell, whereas lower-case names are reserved for application use. Consider following this in your own code to prevent any potential for stomping on variables with meaning to the system by mistake. (This spec is explicit for environment variables, but setting a regular shell variable overwrites any like-named environment variable, so they share a namespace). – Charles Duffy Jan 12 '17 at 16:24
  • 2
    BTW, `$FILES/*.txt` will break if your `FILES` path contains any spaces, hence `"$FILES"/*.txt`. – Charles Duffy Jan 12 '17 at 16:26
  • @CharlesDuffy - Thanks - this is all great to know – fugu Jan 12 '17 at 16:27
  • BTW, all the mucking around with external commands such as `cut` and `tail` is really slow compared to native string manipulation primitives. I've added an addendum to my answer; see also [BashFAQ #100](http://mywiki.wooledge.org/BashFAQ/100), and [the bash-hackers page on parameter expansion](http://wiki.bash-hackers.org/syntax/pe). – Charles Duffy Jan 12 '17 at 16:40
  • ...to provide a more complete explanation of something touched on my answer, re: `$((id + 1))` -- assuming you meant that to be `$(( number + 1 ))` (since `$id` is a string starting with non-numeric characters): If the first character of a number is a `0`, then it's interpreted as octal; thus, `echo $(( 010 ))` emits 8, and `echo $(( 09 ))` results in a "value too great for base" error; hence the need for `$((10#$number))` in my answer to force interpretation as decimal. – Charles Duffy Jan 12 '17 at 18:13

1 Answers1

4

I actually wouldn't use a for loop at all. A while loop that shifts files off is a perfectly reasonable way to do this.

# here, we're overriding the argument list with the list of files
# ...you can do this in a function if you want to keep the global argument list intact
set -- "$FILES"/*.txt                 ## without these quotes paths with spaces break

# handle the case where no files were found matching our glob
[[ -e $1 || -L $1 ]] || { echo "No .txt found in $FILES" >&2; exit 1; }

# here, we're doing our own loop over those arguments
while (( "$#" > 1 )); do              ## continue in the loop only w/ 2-or-more remaining
  echo "Processing files $1 and $2"   ## ...substitute your own logic here...
  shift 2 || break                    ## break even if test doesn't handle this case
done

# ...and add your own handling for the case where there's an odd number of files.
(( "$#" )) && echo "Left over file $1 still exists"

Note that the $#s are quoted inside (( )) here for StackOverflow's syntax highlighting, not because they otherwise need to be. :)


By the way -- consider using bash's native string manipulation.

stem=${file##*/}
IFS=_ read -r p1 p2 id p_rest <<<"$stem"
number=${id:$(( ${#id} - 2 ))}
output_base="${p1}${p2}${id}"
echo "$id $((10#number + 1))" # 10# ensures interpretation as decimal, not octal
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Note, also, this approach works even with obscenely large directories. – bishop Jan 12 '17 at 16:28
  • Yup -- that's a bit nonobvious because one can't pass an argument vector (and environment-variable set) over a certain (OS-dependent) size into a script when starting it, but you can still replace it with a larger list using `set` later. – Charles Duffy Jan 12 '17 at 16:29
  • 1
    Note that `shift 2` is equivalent to `shift; shift` - at least in bash. – choroba Jan 12 '17 at 16:36
  • 2
    @choroba, not quite equivalent -- if there aren't two arguments to be shifted, `shift 2` will do nothing, meaning one needs to be careful not to use it in a `while (( $# ))` loop, which could be endless in that case if one didn't have a `|| break` on the `shift 2`. In this case -- with a condition that breaks with `(( $# == 1 ))` -- though, yes, either is entirely usable even without the explicit `break`. – Charles Duffy Jan 12 '17 at 16:45
  • @CharlesDuffy This works great, but I'm still not entirely clear what set does - could you elaborate a bit more? – fugu Jan 13 '17 at 08:58
  • 1
    @fugu, when used as `set -- arg1 arg2 ...`, it sets `$1` to `arg1`, `$2` to `arg2`, etc; these changes are likewise reflected in syntax such as `$#` (to refer to the number of arguments) and `"$@"` (referring to the entire set). – Charles Duffy Jan 13 '17 at 17:54