37

The next script

str=/aaa/bbb/ccc.txt
echo "str: $str"
echo ${str##*/} == $(basename $str)
echo ${str%/*} == $(dirname $str)

produces:

str: /aaa/bbb/ccc.txt
ccc.txt == ccc.txt
/aaa/bbb == /aaa/bbb

The question is:

  • In bash scripts, when is it recommended to use commands dirname and basename and when the variable substitutions and why?

Asking mainly because:

str="/aaa/bbb/ccc.txt"
count=10000

s_cmdbase() {
let i=0
while(( i++ < $count ))
do
    a=$(basename $str)
done
}

s_varbase() {
let i=0
while(( i++ < $count ))
do
    a=${str##*/}
done
}

s_cmddir() {
let i=0
while(( i++ < $count ))
do
    a=$(dirname $str)
done
}

s_vardir() {
let i=0
while(( i++ < $count ))
do
    a=${str%/*}
done
}

time s_cmdbase
echo command basename
echo ===================================
time s_varbase
echo varsub basename
echo ===================================
time s_cmddir
echo command dirname
echo ===================================
time s_vardir
echo varsub dirname

on my system produces:

real    0m33.455s
user    0m10.194s
sys     0m18.106s
command basename
===================================

real    0m0.246s
user    0m0.237s
sys     0m0.007s
varsub basename
===================================

real    0m30.562s
user    0m10.115s
sys     0m17.764s
command dirname
===================================

real    0m0.237s
user    0m0.226s
sys     0m0.007s
varsub dirname

Calling external programs (forking) costs time. The main point of the question is:

  • Are there some pitfalls using variable substitutions instead of external commands?
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
kobame
  • 5,766
  • 3
  • 31
  • 62
  • 1
    I would say: `dirname` and `basename` are tools for very precise cases like this. Variable substitutions are for more general cases. So I would use `dirname` whenever I want the dir name, `basename` when I want the file name and variable substitutions whenever I need more general things that do not have a specific tool to get. – fedorqui Mar 14 '14 at 09:39
  • 2
    @fedorqui I would argue that `dirname` and `basename` are easier to read, especially for people who don't code shell on a daily basis (so that's a maintenance +1) but the performance difference is a fair point. I'd argue that as soon as you need them inside a loop (and not just on `$0`) you will want to consider using parameter substitution. – Adrian Frühwirth Mar 14 '14 at 10:11
  • **If performance is a consideration** (i.e. lots of time spent doing `dirname`/`basename`), use the parameter expanasion. **However if readability/robustness is more important**, then use the simpler/easier/more-readable `basename`/`dirname`. Usually readability is needed more often... so usually stick with `basename`/`dirname`. – Trevor Boyd Smith Feb 19 '19 at 20:44

3 Answers3

32

The external commands make some logical corrections. Check the result of the next script:

doit() {
    str=$1
    echo -e "string   $str"
    cmd=basename
    [[ "${str##*/}" == "$($cmd $str)" ]] && echo "$cmd same: ${str##*/}" || echo -e "$cmd different \${str##*/}\t>${str##*/}<\tvs command:\t>$($cmd $str)<"
    cmd=dirname
    [[ "${str%/*}"  == "$($cmd $str)" ]] && echo "$cmd  same: ${str%/*}" || echo -e "$cmd  different \${str%/*}\t>${str%/*}<\tvs command:\t>$($cmd $str)<"
    echo
}

doit /aaa/bbb/
doit /
doit /aaa
doit aaa
doit aaa/
doit aaa/xxx

with the result

string   /aaa/bbb/
basename different ${str##*/}   ><          vs command: >bbb<
dirname  different ${str%/*}    >/aaa/bbb<  vs command: >/aaa<

string   /
basename different ${str##*/}   ><  vs command: >/<
dirname  different ${str%/*}    ><  vs command: >/<

string   /aaa
basename same: aaa
dirname  different ${str%/*}    ><  vs command: >/<

string   aaa
basename same: aaa
dirname  different ${str%/*}    >aaa<   vs command: >.<

string   aaa/
basename different ${str##*/}   ><  vs command: >aaa<
dirname  different ${str%/*}    >aaa<   vs command: >.<

string   aaa/xxx
basename same: xxx
dirname  same: aaa

One of most interesting results is the $(dirname "aaa"). The external command dirname correctly returns . but the variable expansion ${str%/*} returns the incorrect value aaa.

Alternative presentation

Script:

doit() {
    strings=( "[[$1]]"
    "[[$(basename "$1")]]"
    "[[${1##*/}]]"
    "[[$(dirname "$1")]]"
    "[[${1%/*}]]" )
    printf "%-15s %-15s %-15s %-15s %-15s\n" "${strings[@]}"
}


printf "%-15s %-15s %-15s %-15s %-15s\n" \
    'file' 'basename $file' '${file##*/}' 'dirname $file' '${file%/*}'

doit /aaa/bbb/
doit /
doit /aaa
doit aaa
doit aaa/
doit aaa/xxx
doit aaa//

Output:

file            basename $file  ${file##*/}     dirname $file   ${file%/*}     
[[/aaa/bbb/]]   [[bbb]]         [[]]            [[/aaa]]        [[/aaa/bbb]]   
[[/]]           [[/]]           [[]]            [[/]]           [[]]           
[[/aaa]]        [[aaa]]         [[aaa]]         [[/]]           [[]]           
[[aaa]]         [[aaa]]         [[aaa]]         [[.]]           [[aaa]]        
[[aaa/]]        [[aaa]]         [[]]            [[.]]           [[aaa]]        
[[aaa/xxx]]     [[xxx]]         [[xxx]]         [[aaa]]         [[aaa]]        
[[aaa//]]       [[aaa]]         [[]]            [[.]]           [[aaa/]]       
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
clt60
  • 62,119
  • 17
  • 107
  • 194
  • 1
    Also note that `$(dirname /)` yields `/` but `var=/; echo "${var%/*}"` yields an empty line. Likewise: `$(dirname abc/)` yields `.` but `var=abc/; echo "${var%/*}"` yields `abc`, and `$(dirname abc//)` yields `.` but `var=abc//; echo "${var%/*}"` yields `abc/`. – Jonathan Leffler Oct 13 '14 at 15:00
  • @JonathanLeffler :) your first two examples (`/` and `aaa/`) are already covered, the double slash `abc//` (last example) is an nice addition. btw, thanx for the edit. – clt60 Oct 13 '14 at 15:14
  • 1
    I've added a tabular presentation — code and data. If you don't like it, please remove it (rollback the edit). Tweak to suit yourself. I find it more legible than your output (give or take the choice of `[[]]` to surround strings), but it doesn't have the markers to indicate that the result is the same or different. – Jonathan Leffler Oct 13 '14 at 15:29
  • I'm noticing that parameter expansion seems perfectly fine if the filepath points to a file rather than directory. So I'm using `[ -f "${file}" ] && path="${file##*/}" || path="${file}"`. Even including the extra condition, it's ~350x faster! For my use-case (inside `.lessfilter`), I only care about files (and urls, covered by the `||` condition) so I'll very happily take the faster option – Shaun Mitchell Nov 18 '22 at 15:09
11
  1. dirname outputs . if its parameter doesn't contain a slash /, so emulating dirname with parameter substitution does not yield the same results depending on the input.

  2. basename takes a suffix as second parameter which will also remove this component from the filename. You can emulate this as well using parameter substitution but since you cannot do both at once it is not as brief as when using basename.

  3. Using either dirname or basename require a subshell since they are not shell builtins, so the parameter substitution will be faster, especially when calling them in a loop (as you have shown).

  4. I have seen basename in different locations on different systems (/usr/bin, /bin) so if you have to use absolute paths in your script for some reason it might break since it cannot find the executable.

So, yes, there are some things to consider and depending on situation and input I use both methods.

EDIT: Both dirname and basename are actually available as bash loadable builtins under examples/loadables in the source tree and can be enabled (once compiled) using

enable -f /path/to/dirname dirname
enable -f /path/to/basename basename
Adrian Frühwirth
  • 42,970
  • 10
  • 60
  • 71
7

The main pitfall in using variable substitutions is that they can be difficult to read and support.

That is, of course, subjective! Personally I use variable substitutions all over the place. I use read, IFS, and set instead of awk. I use bash regular expressions, and bash extended globbing instead of sed. But that is because:

a) I want performance

b) I am the only person who will ever see these scripts

It is sad to say that many people who have to maintain shell scripts know frightenly little about the language. You have to make a balance decision: which is more important, performance or maintainability? On most occasions you will find that maintainability wins.

You have to admit that basename $0 is fairly obvious, whereas ${0##*/} is fairly obscure

cdarke
  • 42,728
  • 8
  • 80
  • 84
  • 1
    +1 You are expanding on the point I made in my comment above so I agree, although I still prefer parameter substitution myself for the obvious performance impact. Readability (and the edge case of `dirname foo-no-slash`, if you need the behaviour) is the only argument *for* using the externals in my opinion so I'd still vote against using them. – Adrian Frühwirth Mar 14 '14 at 10:18