13
initiate () {
read -p "Location(s) to look for .bsp files in? " loc
find $loc -name "*.bsp" | while read
do
    if [ -f "$loc.bz2" ]
    then
        continue
    else
        filcount=$[$filcount+1]
        bzip $loc
    fi
    if [ "$scan" == "1" ]; then bzipint $loc
    fi
    echo $filcount    #Correct counting
    echo $zipcount    #Correct counting
    echo $scacount    #Correct counting
    echo $valid       #Equal to 1
done

echo $filcount    #Reset to 0
echo $zipcount    #Reset to 0
echo $scacount    #Reset to 0
echo $valid       #Still equal to 1
}

I'm writing a bash shell script to use bzip2 to zip up all .bsp files inside a directory. In this script I have several variables for counting totals (files, successful zips, successful integrity scans), however I seem to have run into a problem.

When find $loc -name "*.bsp" runs out of files to give the while read and while read exits, it zeros out $filcount, $zipcount and $scacount (all of which are changed (increased) inside initiate (), bzip () (which is called during initiate ()) or bzipint () (which is also called in initiate ()).

In order to test if it's something to do with variables changing inside initiate () or other functions accessed from it, I used echo $valid, which is defined outside of initiate () (like $filcount, $zipcount, etc.), but is not changed from another function inside initiate () or inside initiate () itself.

Interestingly enough, $valid does not get reset to 0 like the other variables inside initiate.

Can anyone tell me why my variables magically get reset when while read exits?

codeforester
  • 39,467
  • 16
  • 112
  • 140

3 Answers3

12

if you use bash

while read
do
    if [ -f "$REPLY.bz2" ]
    then
        continue
    else
        filcount=$[$filcount+1]
        bzip $REPLY
    fi
    if [ "$scan" == "1" ]; then bzipint $REPLY
    fi
    echo $filcount    #Correct counting
    echo $zipcount    #Correct counting
    echo $scacount    #Correct counting
    echo $valid       #Equal to 1
done < <(find $loc -name "*.bsp")
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • 2
    +1 much cleaner than passing variables as output; this does essentially the same thing as the pipe version, but with the while loop in the main process. – Gordon Davisson Sep 06 '11 at 06:57
11

I ran into this problem yesterday.

The trouble is that you're doing find $loc -name "*.bsp" | while read. Because this involves a pipe, the while read loop can't actually be running in the same bash process as the rest of your script; bash has to spawn off a subprocess so that it can connect the the stdout of find to the stdin of the while loop.

This is all very clever, but it means that any variables set in the loop can't be seen after the loop, which totally defeated the whole purpose of the while loop I was writing.

You can either try to feed input to the loop without using a pipe, or get output from the loop without using variables. I ended up with a horrifying abomination involving both writing to a temporary file AND wrapping the whole loop in $(...), like so:

var="$(producer | while read line; do
    ...
    echo "${something}"
done)"

Which got me var set to all the things that had been echoed from the loop. I probably messed up the syntax of that example; I don't have the code I wrote handy at the moment.

Ben
  • 68,572
  • 20
  • 126
  • 174
  • Excellent answer, thanks for the clarification. Yeah, this is definitely going to cause some complications... I might have to end up using a temporary file or two to get around this, like you did. :P –  Sep 05 '11 at 23:32
  • Hello, people reading this in the future, ten years or more after this question was posted! Please use [the process substitution version](https://stackoverflow.com/a/7314435/785213) instead. Once you've grappled with the basic notion that "pipes are subshells" and understand what [process substitutions](https://www.gnu.org/software/bash/manual/html_node/Process-Substitution.html) do, it is _much_ clearer when written that way. Even the ancient version of Bash that comes with macOS (3.2.something) supports this. – TheDudeAbides Oct 21 '22 at 16:27
9

To summarize options for using read at the end of [the conceptual equivalent of] a pipeline in POSIX-like shells:

To recap: in bash by default and in strictly POSIX-compliant shells always, all commands in a pipeline run in a subshell, so variables they create or modify won't be visible to the current shell (won't exist after the pipeline ends).

The following covers bash, ksh, zsh, and sh ([mostly] POSIX-features-only shells such as dash) and shows ways of avoiding the creation of a subshell so as to preserve the variables created / modified by read.

If no minimum version number is given, assume that even "pretty old" versions support it (the features in question have been around for a long time, but I don't know specifically when they were introduced.

Note that as a [POSIX-compliant] alternative to the solutions below you can always capture a command's output in a [temporary] file, and then feed it to read as < file, which also avoids subshells.


ksh, and zsh: NO workaround/configuration change needed at all:

The read builtin by default runs in the current shell when used as the last command in pipeline.

Seemingly, ksh and zsh by default run any command in the last stage of a pipeline in the current shell.
Observed in ksh 93u+ and zsh 5.0.5.
If you know specifically in what version this feature was introduced, let me know.

#!/usr/bin/env ksh
#!/usr/bin/env zsh

out= # initialize output variable

# Pipe multiple lines to the `while` loop and collect the values in the output variable.
printf '%s\n' one two three | 
 while read -r var; do
   out+="$var/"
 done

echo "$out" # -> 'one/two/three/'

bash 4.2+: use the lastpipe shell option

In bash version 4.2 or higher, turning on shell option lastpipe causes the last pipeline segment to run in the current shell, allowing read to create variables visible to the current shell.

#!/usr/bin/env bash

shopt -s lastpipe # bash 4.2+: make the last pipeline command run in *current* shell

out=
printf '%s\n' one two three | 
 while read -r var; do
   out+="$var/"
 done

echo "$out" # -> 'one/two/three/'

bash, ksh, zsh: use process substitution

Loosely speaking, a process substitution is a way to have a command's output act like a temporary file.

out=
while read -r var; do
  out+="$var/"
done < <(printf '%s\n' one two three) # <(...) is the process substitution

echo "$out" # -> 'one/two/three'

bash, ksh, zsh: use a here-string with a command substitution

out=
while read -r var; do
  out+="$var/"
done <<< "$(printf '%s\n' one two three)" # <<< is the here-string operator

echo "$out" # -> 'one/two/three'

Note the need to double-quote the command substitution to protect its output from shell expansions.


POSIX-compliant solution (sh): use a here-document with a command substitution

#!/bin/sh

out=
while read -r var; do
  out="$out$var/"
done <<EOF # <<EOF ... EOF is the here-doc
$(printf '%s\n' one two three)
EOF

echo "$out" # -> 'one/two/three'

Note that, by default, you need to place the ending delimiter - EOF, in this case - at the very beginning of the line, and that no characters must follow it.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • So, only shell builtins can affect the shell environment, and only shell builtins running in the current shell at that -- and shells have historically only run builtins directly from the first pipe stage. So some hypothetical marvelous shell could decide that if there's a single stage running a builtin, that's the stage that it should run directly, but for now you either run the builtin in the first stage or use bash's `lastpipe` option to have it use the last instead. Cool. Thanks! I kinda knew that before, but only kinda. – jthill Mar 07 '15 at 17:48
  • @jthill: Turns out that `ksh` and `zsh` already have that magic built in for the _last_ pipeline segment - see my update. The _first_ stage of a pipeline is _always_ run in a _subshell_, even in `ksh` and `zsh`. – mklement0 Mar 07 '15 at 18:02
  • @jthill: Note that wanting the _last_ stage to be in the current shell is the typical use case: you want to capture the _results_ of a pipeline in variables visible to the current shell. Can you tell me what you mean by "if there's a single stage running a builtin"? – mklement0 Mar 07 '15 at 18:46
  • To tap into the middle of the pipeline, say `sort data | while read; do whatever here;: $((subtotal+=someresult)); echo $((someresult)); done | work on the munged stuff; done` – jthill Mar 07 '15 at 20:06
  • @jthill: Got it - that would be handy, though I suspect that the _last_ stage is usually the one where you re-enter the "shell world", conceptually speaking. – mklement0 Mar 07 '15 at 22:08