0

I have couple of Text files which have text in following format

Technical :

localization lengths is observed at particular energies for an increasing binary backbone disorder. We comment on the possible biological relevance of sequence-dependent charge transfer in DNA

Work : 

We find that random and λ-DNA have localization lengths allowing for electron motion among a few dozen basepairs only.

Technical : 

We study the electronic properties of DNA by way of a tight-binding model applied to four particular DNA sequences. The charge transfer properties are presented in terms of localization lengths (crudely speaking, the length over which electrons travel.

Education :

Electronic, DNA sequence   

Now I want to extract paragraph with heading "Technical" with my code I can extract particular paragraph between two heading, but cannot extract all paragraph with similar headings.

with open("aks.txt") as infile, open("fffm",'w') as outfile:
    copy = False
    for line in infile:
        if line.strip() == "Technical":
            copy = True
        elif line.strip() == "Work":
            copy = False
        elif copy:
            outfile.write(line)
        fh = open("fffm.txt", 'r')
        contents = fh.read()
        len(contents)
Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Surya Nani
  • 11
  • 6
  • Possible duplicate of [How can I get only heading names.from the text file](http://stackoverflow.com/questions/34004631/how-can-i-get-only-heading-names-from-the-text-file) – Martin Evans Nov 30 '15 at 21:16

1 Answers1

0

Use regular expressions with re module. See: https://docs.python.org/2/library/re.html

This code does what you want:

import re

the_text = """Technical :

localization lengths is observed at particular energies for an increasing binary backbone disorder. We comment on the possible biological relevance of sequence-dependent charge transfer in DNA

Work :

We find that random and λ-DNA have localization lengths allowing for electron motion among a few dozen basepairs only.

Technical :

We study the electronic properties of DNA by way of a tight-binding model applied to four particular DNA sequences. The charge transfer properties are presented in terms of localization lengths (crudely speaking, the length over which electrons travel.

Education :

Electronic, DNA sequence"""

for title, content in re.findall('(\w+) +?:\s+?(.+)', the_text):
    if title.lower() == "technical":
        print "Title: {}".format(title)
        print "Content: {}\n".format(content)

Output:

Title: Technical
Content: localization lengths is observed at particular energies for an increasing binary backbone disorder. We comment on the possible biological relevance of sequence-dependent charge transfer in DNA

Title: Technical
Content: We study the electronic properties of DNA by way of a tight-binding model applied to four particular DNA sequences. The charge transfer properties are presented in terms of localization lengths (crudely speaking, the length over which electrons travel.
Andrés Pérez-Albela H.
  • 4,003
  • 1
  • 18
  • 29