1

I want to use the fastacmd to extract specific regions of fasta sequences. To do that I need to put the name of the fasta file -d, the name of the sequence -s and the position of the sequence to extract -L. For example:

fastacmd -d OAP11402.1.fa -s OAP11402.1 -L 50,100

But the problem is that I have hundreds of files (each file has one sequence with the same name of the file) and the info of position of each sequence to extract is in a protein database (info_sequences.txt). So, I want to make a loop to paste the name of the file, sequence and the positions to extract from the protein database info_sequences.txt in the fastacmd.

The look of info_sequences.txt is like this:

    File          seq_id      position_start    position_end
    OAP11402.1.fa OAP11402.1              50             100 
    OAP15774.1.fa OAP15774.1              75             200 
    OAP10214.1.fa OAP10214.1              33             310

I think that awk could help but i'm struggling with the way to paste the info in the fastcmd

2 Answers2

1
source <(
    awk 'NR > 1 {
        printf "echo fastacmd -d %s -s %s -L %d,%d\n", $1, $2, $3, $4
    }' info_sequences.txt 
)

The awk command spits out all the commands.
Then the source <( ... ) evaluates the commands in your current shell.

Same advice as Cyrus, if it looks OK remove the echo


Or, do it all in awk:

awk 'NR > 1 {
    cmd = "echo fastacmd -d " $1 " -s " $2 " -L " $3 "," $4
    system(cmd)
}' info_sequences.txt 
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
0
awk 'NR>1 {print "-d",$1,"-s",$2,"-L",$3","$4}' info_sequences.txt | xargs -I {} echo fastacmd {}

Output:

fastacmd -d OAP11402.1.fa -s OAP11402.1 -L 50,100
fastacmd -d OAP15774.1.fa -s OAP15774.1 -L 75,200
fastacmd -d OAP10214.1.fa -s OAP10214.1 -L 33,310

If everything looks okay, remove echo.

Cyrus
  • 84,225
  • 14
  • 89
  • 153