0

Please give me some advice on removing newline characters before alphabets and ignoring the lines starting with >. eg:

>gi|16802049|ref|NP_463534.1| chromosomal replication initiation protein [Listeria monocytogenes EGD-e]
MQSIEDIWQETLQIVKKNMSKPSYDTWMKSTTAHSLEGNTFIISAPNNFVRDWLEKSYTQFIANILQEIT
GRLFDVRFIDGEQEENFEYTVIKPNPALDEDGIEIGKHMLNPRYVFDTFVIGSGNRFAHAASLAVAEAPA
KAYNPLFIYGGVGLGKTHLMHAVGHYVQQHKDNAKVMYLSSEKFTNEFISSIRDNKTEEFRTKYRNVDVL
LIDDIQFLAGKEGTQEEFFHTFNTLYDEQKQIIISSDRPPKEIPTLEDRLRSRFEWGLITDITPPDLETR
IAILRKKAKADGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLVNKDITAGLAAEALKDIIPSSKS
QVITISGIQEAVGEYFHVRLEDFKAKKRTKSIAFPRQIAMYLSRELTDASLPKIGDEFGGRDHTTVIHAH
EKISQLLKTDQVLKNDLAEIEKNLRKAQNMF

>gi|16802050|ref|NP_463535.1| DNA polymerase III subunit beta [Listeria monocytogenes EGD-e]
MKFVIERDRLVQAVNEVTRAISARTTIPILTGIKIVVNDEGVTLTGSDSDISIEAFIPLIENDEVIVEVE
SFGGIVLQSKYFGDIVRRLPEENVEIEVTSNYQTNISSGQASFTLNGLDPMEYPKLPEVTDGKTIKIPIN
VLKNIVRQTVFAVSAIEVRPVLTGVNWIIKENKLSAVATDSHRLALREIPLETDIDEEYNIVIPGKSLSE
LNKLLDDASESIEMTLANNQILFKLKDLLFYSRLLEGSYPDTSRLIPTDTKSELVINSKAFLQAIDRASL
LARENRNNVIKLMTLENGQVEVSSNSPEVGNVSENVFSQSFTGEEIKISFNGKYMMDALRAFEGDDIQIS
FSGTMRPFVLRPKDAANPNEILQLITPVRTY

should come in a straight line and while the newline before lines starting with '>' should not be removed. I tried

\n^[a-z]

but it also removes the first alphabet of each line. Is it possible for it to do the same without removing the first alphabet of each line and ignore lines starting with '>'. thax in advance. Iam looking for a code for textpad.

stema
  • 90,351
  • 20
  • 107
  • 135
The Last Word
  • 203
  • 1
  • 7
  • 24

2 Answers2

0

You can use this regex

 [\r\n]+(?=[a-zA-Z])

and replace it with empty string

OR

[\r\n]+([a-zA-Z])

and replace it with \1 or $1 whichever works

Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • @potterbond007 have you selected the checkbox of regular expression – Anirudha Jun 27 '13 at 05:44
  • yes i have.. I am right now trying the same with perl. You have any idea for a regular expression that would do the same in perl. – The Last Word Jun 27 '13 at 07:58
  • These expressions by Anirudh work however it appears that the latest version of Textpad for windows doesn't handle regular expressions correctly. I did not try previous versions. I installed the application and tried looking for something as basic as `[A-Z]` which was straight from the manual and this failed too. I recommend dumping textpad in favor of something like notepad++ which has better support for regular expressions. – Ro Yo Mi Jun 27 '13 at 12:38
  • Thx for the advice. I will definitely have a look. – The Last Word Jun 28 '13 at 08:07
0

I have solved this by using regular expressions in perl. for anyone who needs something like this in the future

use warnings;

print "Please enter the name of the file\n";
my $n =<STDIN>;

print "Please enter the name of the output file\n";
my $n1=<STDIN>;

open(INFO,"$n") or die "cannot open";
@a = <INFO>;

#print @a;

foreach(@a)
    {
        $_ =~ s/\n//g;
        $_ =~ s/>/\n>/g;
    }
#print @a;
open (MYFILE, ">$n1");
print MYFILE @a;
close(MYFILE);
close(INFO);

It's extremely simple.

The Last Word
  • 203
  • 1
  • 7
  • 24