10

I have hundreds of files, named as follows:

RG1-t.txt

RG1-n.txt

RG2-t.txt

RG2-n.txt

etc...

I would like to use GNU parallel to run scripts on them, but I struggle to get the basenames of the files, so RG1, RG2 etc... so that I can run:

ls RG*.txt | parallel "command.sh {basename}-t.txt {basename}-n.txt > {basename}.out"

resulting in the files RG1.out, RG2.out etc. Any ideas?

ATpoint
  • 603
  • 5
  • 17

3 Answers3

22

Use the built-in stripping options:

  1. Dirname ({/}) and basename ({%}) and remove custom suffix ({^suffix})

    $ echo dir/file_1.txt.gz | parallel --plus echo {//} {/} {%_1.txt.gz}

  2. Get basename, and remove last ({.}) or any ({:}) extension

    $ echo dir.d/file.txt.gz | parallel 'echo {.} {:} {/.} {/:}'

This should do what you need:

ls RG*.txt | parallel "command.sh {.}-t.txt {.}-n.txt > {.}.out"
jaygooby
  • 2,436
  • 24
  • 42
  • 1
    in parallel 20190422, `{/}` seems to be the equivalent of `basename` and `{//}` the equivalent of `dirname`. – Florian Castellane Jul 04 '19 at 05:14
  • 1
    Given input string 'foo/bar.baz', these are the replaced strings: `{}` => 'foo/bar.baz', {.}` => 'foo/bar', `{/}` => 'bar.baz', `{//}` => 'foo', `{/.}` => 'bar', as of parallel 20161222 – forresthopkinsa Apr 10 '20 at 08:49
  • 1
    in 20161222, any reason why removing the suffix doesn't work, i.e. `parallel --plus echo '{%.bar.gz}' ::: foo.ext.bar.gz` should give me `foo.ext` but it's giving me `{%.bar.gz} foo.ext.bar.gz` – Brian Wiley Jan 12 '21 at 21:36
  • 2
    Ok seems like this only works for later than 20161222. I upgraded to 20201222 ('Vaccine'). Gotta love parallel for it's unique humor :0 – Brian Wiley Jan 12 '21 at 22:00
3

Use --rpl:

printf '%s\0' RG*-n.txt |
  parallel -0 --rpl '{basename} s/-..txt$//' "command.sh {basename}-t.txt {basename}-n.txt > {basename}.out"

Or dynamic replacement strings with --plus:

printf '%s\0' RG*-n.txt |
  parallel -0 --plus "command.sh {%-n.txt}-t.txt {} > {%-n.txt}.out"

The printf avoids:

bash: /bin/ls: Argument list too long
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
2

Try feeding parallel like this:

ls RG*t.txt | cut -d'-' -f1 | parallel 'command.sh {}-t.txt {}-n.txt > {}.out'

Or, if you prefer awk:

ls RG*t.txt | awk -F'-' '{print $1}' | parallel ...

Or, if you prefer sed:

ls RG*t.txt | sed 's/-.*//' | parallel ...

Or, if you prefer GNU grep:

ls RG* | grep -Po '.*(?=-t.txt)' | parallel ...
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • @forresthopkinsa You should not make such strong statements. There are multiple possible solutions. The one you criticize is perfectly fine and the most generic (which is why accepted it back in the day) as it allows complete external control over what is piped into parallel. Other solutions below are perfectly fine as well. – ATpoint May 30 '20 at 12:31
  • @ATpoint There seems to be a general consensus in the votes that the below solution is the better one. I only commented on this one because it's the accepted answer and I want to ensure that people keep scrolling. – forresthopkinsa May 31 '20 at 19:12