-1

I have the following .txt:

TITLE Genetic variation in the complete MgPa operon and its repetitive chromosomal elements in clinical strains of Mycoplasma genitalium

JOURNAL PLoS ONE 5 (12), E15660 (2010)

PUBMED 21187921

REMARK Publication Status: Online-Only

REFERENCE 3 (bases 1 to 1480)

PUBMED 21997874

REFERENCE 4 (bases 1 to 1480)

REFERENCE 5 (bases 1 to 1480)

AUTHORS Ma,L., Jensen,J.S., Jia,Q., Mancuso,M.A., Myers,L.J. and Martin,D.H.

TITLE Direct Submission

ORIGIN
1 agtaagaatg ttactgctta cacccccttc gccaccccca tcaccgattc taaaagtgat 61 ctggttagtt tggcacaact tgattcttct tatcaaatcg ctgaccaaac catccataac 121 accaacttgt ttgtgttgtt caagtccaag gatgtgaagc ttacatatag ttcaagtggc 181 tcaaataacc agattagttt tgattcaact agtcaaggtg aaaaaccatc ctatgtggtc 241 gagtttacta actctaccaa cattggcatc aagtgaagcg tggtgaaaaa gtatcagtta 301 gatctaccaa atgttaccaa tgagatgaac caagtgttgc aagaattgat cctagaacaa 361 ccccttacca agtatacctt aaacagtagt ttggctaaac aaaagggcaa aagccagata 421 gaggtacatc ttggttcaaa ttcaaatcag tgacaatcga tgcgtaatca acatgaccta 481 aacaacaatc ccagccccaa tgcttcaact gggtttaaac tcactaccgg caacgcatat 541 agaaaattaa atgagtcctg accaatttat caaccaattg atgggaccaa gcagggcaaa 601 gggaaggata gtagtgggtg gagttcaaca gaagcaacaa cggcaaaaaa tgatgcgccc 661 agtgtttctg gaagtggaac atcagacacc gcttcaaaat tcaaaagtta cctcaacacc 721 aagcaagcgt tagagagcat cggcatcttg tttgatgggg atggaatgag gaatgtggtt 781 acccagctct attatgcttc tactagcaag ctagcagtca ccaacaacca cattgtcgtg 841 atgggtaaca gctttctacc cagcatgtgg tactgggtgg tggagcggag tgcaacaact 901 gattcatcat caaaacccac ctggtttgct aataccaatt taaactgagg ggaagataaa 961 caaaaacaat ttgttgagaa ccagttgggg tataaggaaa ctaccagtac caattcccac 1021 aacttccatt ccaaatcttt cacccaacct gcatatctga tcagtggcat tgacagtgtc 1081 aatgatcaaa tcatcttcag tggctttaaa gcggggagtg tggggtatga tagtagtagt 1141 agtagtagta gtagtagtag tagtagtacc aaagaccaag cacttgcttg atcaacaaca 1201 actagcttag atagtaaaac ggggtatagg gatctagtga ccaacgacac ggggctaaat 1261 ggtccgatca atgggagttt ttcaatccaa gacaccttca gctttgttgt tccttattcg 1321 gggaatcata caaattcaag tggttcatca ggacccatta aaactgctta tccagtgaaa 1381 aaagatcaaa aatcaactgt caagatcaat tctttgatta acgctacgcc cttgaatagt 1441 tatggggatg aggggattgg ggtgtttgat gcgttaggtt //

And I want to create a new one where the output is like this.

agtaagaatg ttactgctta cacccccttc gccaccccca tcaccgattc taaaagtgat ctggttagtt tggcacaact tgattcttct tatcaaatcg ctgaccaaac catccataac accaacttgt ttgtgttgtt caagtccaag gatgtgaagc ttacatatag ttcaagtggc tcaaataacc agattagttt tgattcaact agtcaaggtg aaaaaccatc ctatgtggtc gagtttacta actctaccaa cattggcatc aagtgaagcg tggtgaaaaa gtatcagtta gatctaccaa atgttaccaa tgagatgaac caagtgttgc aagaattgat cctagaacaa ccccttacca agtatacctt aaacagtagt ttggctaaac aaaagggcaa aagccagata gaggtacatc ttggttcaaa ttcaaatcag tgacaatcga tgcgtaatca acatgaccta aacaacaatc ccagccccaa tgcttcaact gggtttaaac tcactaccgg caacgcatat agaaaattaa atgagtcctg accaatttat caaccaattg atgggaccaa gcagggcaaa gggaaggata gtagtgggtg gagttcaaca gaagcaacaa cggcaaaaaa tgatgcgccc agtgtttctg gaagtggaac atcagacacc gcttcaaaat tcaaaagtta cctcaacacc aagcaagcgt tagagagcat cggcatcttg tttgatgggg atggaatgag gaatgtggtt acccagctct attatgcttc tactagcaag ctagcagtca ccaacaacca cattgtcgtg atgggtaaca gctttctacc cagcatgtgg tactgggtgg tggagcggag tgcaacaact gattcatcat caaaacccac ctggtttgct aataccaatt taaactgagg ggaagataaa caaaaacaat ttgttgagaa ccagttgggg tataaggaaa ctaccagtac caattcccac aacttccatt ccaaatcttt cacccaacct gcatatctga tcagtggcat tgacagtgtc aatgatcaaa tcatcttcag tggctttaaa gcggggagtg tggggtatga tagtagtagt agtagtagta gtagtagtag tagtagtacc aaagaccaag cacttgcttg atcaacaaca actagcttag atagtaaaac ggggtatagg gatctagtga ccaacgacac ggggctaaat ggtccgatca atgggagttt ttcaatccaa gacaccttca gctttgttgt tccttattcg gggaatcata caaattcaag tggttcatca ggacccatta aaactgctta tccagtgaaa aaagatcaaa aatcaactgt caagatcaat tctttgatta acgctacgcc cttgaatagt tatggggatg aggggattgg ggtgtttgat gcgttaggtt

How can I tell to the console that I want a new file but just from ORIGIN till the end?

  • 2
    Have you tried anything yourself? What problems did you run into? Please ask specific questions, and provide some example of code to show what you're able to do, focus on a specific problem - don't expect SO to write your code for you. Have a look at [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask) – Grismar Feb 23 '22 at 12:01

2 Answers2

1

If what you want is to create a .txt from a certain line, this could work for you:

flag = false
with open("src.txt", "r") as src:
    with open("dst.txt", "w") as dst:
        for line in src:
            if(flag == True):
                dst.write(line)
            if(line.__contains__('ORIGIN')):
                flag = True

This iterates through the lines in the source file and whenever it finds the word 'ORIGIN' starts writing what's in the src file into the dst file.

1

It seems that you also want to remove any numbers and slashes. Therefore you could do this:

import re
import sys
with open('infile.txt', encoding='utf-8') as infile:
    try:
        while not next(infile).startswith('ORIGIN'):
            pass
        with open('outfile.txt', 'w', encoding='utf-8') as outfile:
            for line in infile:
                outfile.write(re.sub(r'[\d+|/]', '', line).lstrip())
    except StopIteration:
        print('ORIGIN not found', file=sys.stderr)
DarkKnight
  • 19,739
  • 3
  • 6
  • 22