-4

I have a fasta file like this with headers like this:

>GL13245678
ABCDEDERFSE

>GL123456789
ABDFDRAGDTGEGAGFDAS

>GL1254367890
AFGHSRSGFGSHSFG

I want to change the header to contain only GL and 6 digits and remove the empty line above each header, like this:

>GL132456
ABCDEDERFSE
>GL123456
ABDFDRAGDTGEGAGFDAS
>GL125436
AFGHSRSGFGSHSFG

Can anyone share a perl script for this? Thanks

RobEarl
  • 7,862
  • 6
  • 35
  • 50

1 Answers1

1

Remove anything from headers (lines starting with >) after GL and 6 digits:

s/^>GL\d{6}\K.+//

Only print non-empty (whitespace only) lines:

print if /\S/

Putting it all together:

perl -ne 's/^>GL\d{6}\K.+//; print if /\S/' file
RobEarl
  • 7,862
  • 6
  • 35
  • 50