how to produce multiple readlength.tsv at once from multiple fastq files?

Question

ı have 16 fastq files under the different directories to produce readlength.tsv seperately and ı have some script to produce readlength.tsv .this is the script that ı should use to produce readlength.tsv

zcat ~/proje/project/name/fıle_fastq | paste - - - - | cut -f1,2 | while read readID sequ;
do
    len=`echo $sequ | wc -m`
    echo -e "$readID\t$len"
done > ~/project/name/fıle1_readlength.tsv

one by one ı can produce this readlength but it will take long time .I want to produce readlength at once thats why I created list that involved these fastq fıles but ı couldnt produce any loop to produce readlength.tsv at once from 16 fastq files.

ı would appreaciate ıf you can help me

Your question is not quite clear. But you can try `xargs` to process multiple files from a list. `ls ~/proje/project/name/*fastq | xargs zcat | paste - - - - | cut -f1,2 | while read readID sequ; do len=`echo $sequ|wc -m`; echo -e "$readID\t$len"; done` — WeDBA, Aug 23 '22 at 22:26
hello ı m sorry not to be clear. I have multiple fastq files under the different diretories and ı was trying to produce multiple readlength from these fastq files that s why ı created list . one by one ı can produce this readlength but ıt will tae long time ı want to produce at once — pierogi, Aug 23 '22 at 22:30
it may help if you could update the question to include 5-10 lines of output from each of `zcat`, `zcat|paste` and `zcat|paste|cut` so that we have a better understanding of what the data looks like at each step; it's possible there are multiple possible answers, some of which may be able to replace, say, the `paste` and `cut` calls as well as the loop, but we'll need to see some actual data; also keep in mind that any degree of parallel operations may be limited by the number of available cpus as well as the io bandwidth of your disk(s) — markp-fuso, Aug 23 '22 at 23:03

score 1 · Accepted Answer · answered Aug 23 '22 at 23:27

Assuming a file list.txt contains the 16 file paths such as:

~/proje/project/name/file1_fastq
~/proje/project/name/file2_fastq
..
~/path/to/the/fastq_file16

Then would you please try:

#!/bin/bash

while IFS= read -r f; do                # "f" is assigned to each fastq filename in "list.txt"
    mapfile -t ary < <(zcat "$f")       # assign "ary" to the array of lines
    echo -e "${ary[0]}\t${#ary[1]}"     # ${ary[0]} is the id and ${#ary[1]} is the length of sequence
done < list.txt > readlength.tsv

As the fastq file format contains the id in the 1st line and the sequence in the 2nd line, bash built-in mapfile will be better to handle them.

As a side note, the letter ı in your code looks like a non-ascii character.

Thank you for the feedback. Good to know it works. If you feel my answer solves well your problem, I'd appreciate if you can accept my answer by clicking on the check mark beside the answer. BR. — tshiono, Aug 29 '22 at 07:52

how to produce multiple readlength.tsv at once from multiple fastq files?

1 Answers1