-1

I am trying go count the length of each read in a fastq file from illumina sequencing and outputting this to a tsv or any sort of file so I can then later also look at this and count the number of reads per file. So I need to cycle down the file and eactract each line that has a read on it (every 4th line) then get its length and store this as an output

num=2
for file in *.fastq
do
    echo "counting $file"
    function file_length(){
    wc -l $file | awk '{print$FNR}'
    }
    for line in $file_length
    do
        awk 'NR==$num' $file | chrlen > ${file}read_length.tsv
        num=$((num + 4))
    done
done

Currently all I get the counting $file and no other output but also no errors

Rob
  • 17
  • 5

1 Answers1

1

Your script contains a lot of errors in both syntax and algorithm. Please try shellcheck to see what is the problem. The most issue will be the $file_length part. You may want to call a function file_length() here but it is just an undefined variable which is evaluated as null in the for loop.

If you just want to count the length of the 4th line of *.fastq files, please try something like:

for file in *.fastq; do
    awk 'NR==4 {print length}' "$file" > "${file}_length.tsv"
done

Or if you want to put the results together in a single tsv file, try:

tsvfile="read_lenth.tsv"
for file in *.fastq; do
    echo -n -e "$file\t" >> "$tsvfile"
    awk 'NR==4 {print length}' "$file" >> "$tsvfile"
done

Hope this helps.

tshiono
  • 21,248
  • 2
  • 14
  • 22