As you noticed, samtools
has the option --reverse-complement
(or -i
) to output the sequence from the reverse strand.
As far as I know, samtools
does not support a region notation which permits specifying the strand.
A quick solution would be to separate your region file into forward and reverse locations and run samtools
twice.
The steps below are rather verbose, just so the steps are clear. It's fairly straight-forward to clean this up with process substitution in bash, for example.
# Separate the strand regions.
# Use grep and sed twice, or awk (below).
grep -F '(Forward)' regions.txt | sed 's/ (Forward)//' > forward-regions.txt
grep -F '(Reverse)' regions.txt | sed 's/ (Reverse)//' > reverse-regions.txt
# Above as an awk one-liner.
awk '{ strand=($2 == "(Forward)") ? "forward" : "reverse"; print $1 > strand"-regions.txt" }' regions.txt
# Run samtools, marking the strand as +/- in the FASTA output.
samtools faidx ref.fa -r forward-regions.txt --mark-strand sign -o forward-sequences.fa
samtools faidx ref.fa -r reverse-regions.txt --mark-strand sign -o reverse-sequences.fa --reverse-complement
# Combine the FASTA output to a single file.
cat forward-sequences.fa reverse-sequences.fa > sequences.fa
rm forward-sequences.fa reverse-sequences.fa