0

I would like to input a variable in the place of a file in the command below, is this possible?

I would like to generate some FASTA files using the programme bedtools getfasta. Typically I would order the input files first using the awk to bedtools sort commands below. The output is then piped into an output file. This output file would then follow the -bed flag in bedtools getfasta so I can create the FASTA files I require, e.g.

awk '{if ($2>$3)print $1,$3,$2,".",".","-";else print $1,$2,$3,".",".","+";}' OFS='\t' Infile.bed | 
awk '{a=$2-1;print $1,a,$3,$4,$5,$6;}' OFS='\t' | 
bedtools sort > OrderedFile.bed


bedtools getfasta -s -fi Infile.fasta -bed OrderedFile.bed -fo Outfile.fasta

However, I have a lot of files I would like to use bedtools getfasta for. I was hoping to avoid creating the additional OrderedFile.bed files by setting the output of the initial awk to bedtools sort commands as a variable (see below)

swapped=$(awk '{if ($2>$3)print $1,$3,$2,".",".","-";else print $1,$2,$3,".",".","+";}' OFS='\t' Infile.bed | awk '{a=$2-1;print $1,a,$3,$4,$5,$6;}' OFS='\t' | bedtools sort) 

This works quite nicely:

echo "${swapped}"
HEADING_1   4   12  .   .   +
HEADING_2   4   12  .   .   -

When I use the variable in the bedtools getfasta command no output is generated. Is there a way to for a variable to be read like a file? I have tried the following, but it is still not working:

  1. bedtools getfasta -s -fi Infile.fasta -bed "${swapped}" -fo Outfile.fasta
  2. bedtools getfasta -s -fi Infile.fasta -bed <(echo "${swapped}") -fo Outfile.fasta
  3. bedtools getfasta -s -fi Infile.fasta -bed <(<<< "${swapped}") -fo Outfile.fasta

Basically, can I use a variable in place of a file as an argument for a command?

I hope that this makes sense

Thanks,

Jamie

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
Jpike
  • 187
  • 8
  • 1
    How about just <<< "$swapped" ? – Raman Sailopal Dec 17 '20 at 16:02
  • What happens when you try #2 and #3? – Barmar Dec 17 '20 at 16:06
  • 1
    The second one #2 should have worked fine. – KamilCuk Dec 17 '20 at 16:09
  • 1
    The second one should have worked fine **if** the program could read from a FIFO. Not all software can; sometimes things need a seekable FD (for example, if they implement a 2-pass algorithm and need to read the input file twice). – Charles Duffy Dec 17 '20 at 16:09
  • Anyhow, it would be helpful if you provided the exact error message from the `bedtools getfasta -s -fi Infile.fasta -bed <(echo "${swapped}") -fo Outfile.fasta` attempt, since that's the one that _should_ have worked. (well, `printf '%s\n' "$swapped"` is a [bit more consistent in behavior than `echo`](https://unix.stackexchange.com/a/65819/3113), but it'd be very surprising if that mattered here). – Charles Duffy Dec 17 '20 at 16:12
  • 1
    BTW, another reason #2 could fail besides the seekability requirement is that in modern versions of Python, `subprocess` doesn't pass file descriptors other than stdin, stdout and stderr through to children -- so if some of the processing is done by a separate command that `bedtools` itself starts, that command wouldn't necessarily be able to read the `/dev/fd` link. This would also explain why the `-bed stdin` used in the test suite _does_ work, as stdin is one of the three FDs that _are_ passed through by default. – Charles Duffy Dec 17 '20 at 16:21
  • There was no error message, just an empty file. I assumed that the variable wasn't working as it worked when I used OutputFile.bed. However, I have tried #2 and #3 again and #2 worked!! There must have been a small error when using before that I hadn't noticed. I will go back through and see if I can work out what exactly, but you have answered my question. A silly mistake - I should have checked more carefully. Thank you. Also, would you recommend I use `printf '%s\n' "$swapped"` instead? Thank you – Jpike Dec 17 '20 at 16:22
  • 1
    Yes, I definitely recommend `printf` instead of `echo`. The link above goes into extended details, but even [the POSIX standard for `echo`](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html) makes that recommendation (see the APPLICATION USAGE and RATIONALE sections). – Charles Duffy Dec 17 '20 at 16:23

1 Answers1

2

If you look in the test suite for the bedtools getfasta command, you'll see that it passes the word stdin as a filename when it wants the BED input to be read from stdin. For example:

LINES=$(echo $'chr1\t1\t10' | $BT getfasta -fi t.fa -bed stdin -fo - | awk 'END{ print NR }')

So, we just need to do that same thing in your script:

bedtools getfasta -s -fi Infile.fasta -bed stdin -fo Outfile.fasta <<<"$swapped"

By the way -- in most cases, your second attempt would have worked:

bedtools getfasta -s -fi Infile.fasta -bed <(echo "${swapped}") -fo Outfile.fasta

...insofar as the <(...) expression is replaced by a filename from which the output at hand can be read. (There are some caveats: It's typically passed through a /dev/fd link, so any program that closes file descriptors other than the default stdin, stdout and stderr won't be able to read from content given that way; also, insofar as that filename is an end of a FIFO, anything that needs to be able to seek around in input, read it more than once, check its size before reading, etc. won't work).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441