-1

Please I have question: I have a file like this

@HWI-ST273:296:C0EFRACXX:2:2101:17125:145325/1
TTAATACACCCAACCAGAAGTTAGCTCCTTCACTTTCAGCTAAATAAAAG
+
8?8A;DDDD;@?++8A?;C;F92+2A@19:1*1?DDDECDE?B4:BDEEI
@BBBB-ST273:296:C0EFRACXX:2:1303:5281:183410/1
TAGCTCCTTCGCTTTCAGCTAAATAAAAGCCCAGTACTTCTTTTTTACCA
+
CCBFFFFFFHHHHJJJJJJJJJIIJJJJJJJJJJJJJJJJJJJIJJJJJI
@HWI-ST273:296:C0EFRACXX:2:1103:16617:140195/1
AAGTTAGCTCCTTCGCTTTCAGCTAAATAAAAGCCCAGTACTTCTTTTTT
+
@C@FF?EDGFDHH@HGHIIGEGIIIIIEDIIGIIIGHHHIIIIIIIIIII
@HWI-ST273:296:C0EFRACXX:2:1207:14316:145263/1
AATACACCCAACCAGAAGTTAGCTCCTTCGCTTTCAGCTAAATAAAAGCC
+
CCCFFFFFHHHHHJJJJJJJIJJJJJJJJJJJJJJJJJJJJJJJJJJJIJ

I

I'm interested just about the line that starts with '@HWI', but I want to count all the lines that are not starting with '@HWI'. In the example shown, the result will be 1 because there's one line that starts with '@BBB'.

To be more clear: I just want to know know the number of the first line of the patterns (that are 4 line that repeated) that are not '@HWI'; I hope I'm clear enough. Please tell me if you need more clarification

Reda
  • 449
  • 1
  • 4
  • 17
  • Please, use the code formatting for the data, not blockquote, so we can see where the lines start and end. – choroba Apr 23 '20 at 20:06
  • Yes i made mistake , it's fine now – Reda Apr 23 '20 at 20:11
  • 1
    There is no line that starts with `@BBB`. Do you mean to say "Of all the lines that start with `@`, I want to count how many do not start with `@HWI`?" That's what I think you mean, but if so please update your question with that. And kindly show your attempts. – Quasímodo Apr 23 '20 at 20:20
  • very sorry , i forgot to update the exemple – Reda Apr 23 '20 at 20:33

1 Answers1

0

With GNU sed, you can use its extended address to print every fourth line, then use grep to count the ones that don't start with @HWI:

sed -n '1~4p' file.fastq | grep -cv '^@HWI'

Otherwise, you can use e.g. Perl

perl -ne 'print if 1 == $. % 4' -- file.fastq | grep -cv '^@HWI'

$. contains the current line number, % is the modulo operator.

But once we're running Perl, we don't need grep anymore:

perl -lne '++$c if 1 == $. % 4; END { print $c }' -- file.fastq

-l removes newlines from input and adds them to output.

choroba
  • 231,213
  • 25
  • 204
  • 289