2

Take this piece of code that reads in data separated by |

DATA1="Andreas|Sweden|27"
DATA2="JohnDoe||30"   # <---- UNKNOWN COUNTRY
while IFS="|" read -r NAME COUNTRY AGE; do 
    echo "NAME:    $NAME";
    echo "COUNTRY: $COUNTRY";
    echo "AGE:     $AGE";
done<<<"$DATA2"

OUTPUT:

NAME: JohnDoe
COUNTRY:
AGE: 30

It should work identically to this piece of code, where we are doing the exact same thing, just using \t as a separator instead of |

DATA1="Andreas  Sweden  27"
DATA2="JohnDoe      30"  # <---- THERE ARE TWO TABS HERE
while IFS=$'\t' read -r NAME COUNTRY AGE; do 
    echo "NAME:    $NAME";
    echo "COUNTRY: $COUNTRY";
    echo "AGE:     $AGE";
done<<<"$DATA2"

But it doesn't.

OUTPUT:

NAME: JohnDoe
COUNTRY: 30
AGE:

Bash, or read or IFS or some other part of the code is globbing together the whitespace when it isn't supposed to. Why is this happening, and how can I fix it?

IQAndreas
  • 8,060
  • 8
  • 39
  • 74

1 Answers1

3

bash is behaving exactly as it should. From the bash documentation:

The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words on these characters. If IFS is unset, or its value is exactly <space><tab><newline>, the default, then sequences of <space>, <tab>, and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters space and tab are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter.

To overcome this "feature", you could do something like the following:

#!/bin/bash

DATA1="Andreas  Sweden  27"
DATA2="JohnDoe          30"  # <---- THERE ARE TWO TABS HERE

echo "$DATA2" | sed 's/\t/;/g' |
while IFS=';' read -r NAME COUNTRY AGE; do
    echo "NAME:    $NAME"
    echo "COUNTRY: $COUNTRY"
    echo "AGE:     $AGE"
done
fpmurphy
  • 2,464
  • 1
  • 18
  • 22
  • 1
    Exactly. The shell is there for convenience, and it's got a lot of special cases that are (only) sometimes very convenient. For more precise work, use a step up. Awk is almost always enough: `awk '{print "NAME:\t"$1"\nCOUNTRY:\t"$2"\nAGE:\t"$3}' FS=$'\t' <<<"$DATA2"` – jthill Nov 26 '20 at 04:17
  • Aside: All-caps variables are, per POSIX spec, in a namespace used for variables meaningful to the shell and other POSIX-specified tools; whereas names with at least one lower-case character are reserved for application usage. See https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html, keeping in mind that shell and environment variables share a single namespace, so conventions necessarily apply to both (as setting a shell variable updates and overwrites any like-named environment variable). – Charles Duffy Nov 26 '20 at 04:29
  • @CharlesDuffy. You might want to read Section 8.1 again. It is silent about shell variables; it only talks about environmental variables. Also the standard is silent about environmental variables and shell variables sharing a single namespace. – fpmurphy Nov 26 '20 at 04:49
  • The POSIX sh standard requires assignments to any shell variable that shares the name of an environment variable to modify that environment variable. If that doesn't mean they're in the same namespace, what _would_ you consider to constitute sharing a namespace? And once you accept that they're in the same namespace, it follows that 8.1 applies to shell variables. – Charles Duffy Nov 26 '20 at 05:13
  • Anyhow -- if you want to argue that they aren't in the same namespace, please tell me how to modify the PATH shell variable without also modifying the PATH environment variable. If modifying shell variables also modifies environment variables with the same names, how can you be certain when you create an all-caps shell variable that you aren't unknowingly also modifying an environment variable in the namespace POSIX-compliant utilities use? – Charles Duffy Nov 26 '20 at 05:22