2

I would like to build a little helper function that can deal with fastq.gz and fastq.bz2 files.

I want to merge zcat and bzcat into one transparent function which can be used on both sorts of files:

zbzcat example.fastq.gz
zbzcat example.fastq.bz2


zbzcat() {
  file=`echo $1 | `
## Not working
  ext=${file##*/};
  
  if [ ext == "fastq.gz" ]; then
    exec gzip -cd "$@"  
  else
    exec bzip -cd "$@"  
  fi
}

The extension extraction is not working correctly. Are you aware of other solutions

Biffen
  • 6,249
  • 6
  • 28
  • 36
Peter Pisher
  • 457
  • 2
  • 11

3 Answers3

5

These are quite a lot of problems:

  • file=`echo $1 | ` gives a syntax error because there is no command after |. But you don't need the command substitution anyways. Just use file=$1.
  • ext=${file##*/} is not extracting the extension, but the filename. To extract the extension use ext=${file##*.}.
  • In your check you didn't use the variable $ext but the literal string ext.
  • Usually, only the string after the last dot in a filename is considered to be the extension. If you have file.fastq.gz, then the extension is gz. So use the check $ext = gz. That the uncompressed files are fastq files is irrelevant to the function anyways.
  • exec replaces the shell process with the given command. So after executing your function, the shell would exit. Just execute the command.

By the way: You don't have to extract the extension at all, when using pattern matchting:

zbzcat() {
  file="$1"
  case "$file" in
    *.gz) gzip -cd "$@";;
    *.bz2) bzip -cd "$@";;
    *) echo "Unknown file format" >&2;;
  esac
}

Alternatively, use 7z x which supports a lot of formats. Most distributions name the package p7zip.

Socowi
  • 25,550
  • 3
  • 32
  • 54
  • I first tried it with $1 alone, but this does not work well with the suffix replacement. I updated my code. Thanks for your suggestions. – Peter Pisher Aug 20 '21 at 10:40
  • 1
    This would work equally well as a one-liner wrapper utility `zbzcat(){ { zcat "$1"||bzcat "$1";}2>/dev/null;}` – Léa Gris Aug 20 '21 at 11:44
1
ext=${1##*.}

Why are you throwing in an echo and try to strip a /?

Also, the string ext (3 characters) will never be equal to the string fastq.gz (7 characters). If you want to check that the extension equals gz, just do a

if [[ $ext == gz ]]

Having said this, relying on the extension to get an idea of the content of a file is a bit brave. Perhaps a more reliable way would be to use the file to determine the most likely file type. The probably safest approach would be to just try a bzip extraction first, and if it fails, do the gzip extraction.

user1934428
  • 19,864
  • 7
  • 42
  • 87
  • Thanks for your response. The echo function I used because ${$1} did not work, but the double dollar is apparently not necessary. How would you test if a bzip extraction fails ? – Peter Pisher Aug 20 '21 at 10:42
  • Redirecting stderr to the bitbucket and checking the exit code. The bzip2 man-page says: _Return values: 0 for a normal exit, 1 for environmental problems (file not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt compressed file, 3 for an internal consistency error_ – user1934428 Aug 20 '21 at 10:54
  • @PeterPisher : `${${something}}` does not make sense in bash, AFIK. In zsh, there are certain circumstances, where this kind of nesting works, but then you usually have flags involved, for instance `${(Q)${something}}`. Not sure what you wanted to achieve with this. – user1934428 Aug 20 '21 at 11:00
0

I think it would be better if you would use mimetype.

File extensions are not always correct.

decomp() {  
  case $(file -b --mime-type  $1)  in
    "application/gzip")
         gzip -cd "$@"
         ;;
    "application/x-bzip2")
         bzcat  "$@"
         ;;
    "application/x-xz")
        xzcat "$@"
        ;;
    *) 
      echo "Unknown file format" >&2
    ;;
  esac
}
Gyula Kokas
  • 141
  • 6