3

I have files that are named like this:

MG-AB-110_S101_R2_001.fastq.gz, MG-AB-109_S100_R1_001.fastq.gz...

I am trying to extract everything before the first underscore so that I get: MG-AB-110, MG-AB-109...

I tried to do this:

name="MG-AB-110_S101_R2_001.fastq.gz"
base_name=${name%%.*}
echo $base_name
MG-AB-110_S101_R2_001

and this:

base_name=${name%%(.*?)_.* }
echo $base_name
MG-AB-110_S101_R2_001.fastq.gz

I need these base names to match base names in another folder, so the above regex would be part of this loop:

#!/bin/bash

for name in test1/*.gz; do
    base_name=${name%%.*}

    if [ -f "test2/$base_name" ]; then
        cat "$name" "test2/$base_name" >"all_combined/$base_name"
    else
         printf 'No file in test2 corresponds to "%s"\n' "$name" >&2
    fi
done
newbash
  • 31
  • 3
  • With a regex: `[[ $name =~ ([^_]*) ]] && echo "${BASH_REMATCH[1]}"` – Cyrus Jun 15 '21 at 20:12
  • 2
    See: [bash, extract string before a colon](https://stackoverflow.com/q/20348097/3776858) – Cyrus Jun 15 '21 at 20:16
  • I removed that UPDATE and I posted a new question here: https://stackoverflow.com/questions/67994464/how-to-match-files-in-different-folders-by-partial-file-name-and-concatenate-the – newbash Jun 16 '21 at 12:24

1 Answers1

6

With bash and its Parameter Expansion:

name="MG-AB-110_S101_R2_001.fastq.gz"
echo "${name%%_*}"

Output:

MG-AB-110
Cyrus
  • 84,225
  • 14
  • 89
  • 153