-3

How to join line in a specifice interval using python scripting?

I am working of text analysis for the competition. But there is some problem with that. the question start with the number followed by '.' in the html extension file <p>1. *</p> The lines are in html extension file given below.

<p>1. Arrange the following words as per the order in dictionary.</p>
<p>1. Inappropriate</p>
<p>2. Inappeasable</p>
<p>3. Inaptitude</p>
<p>4. Inapplicable</p>
<p>5. Inapprehensible</p>
<p>(1) 25431</p>
<p>(2) 13425</p>
<p>(3) 24513</p>
<p>(4) 52341</p>
<p>Answer key: 3</p><p>This is the correct form as per the order in dictionary.</p><p>Inappeasable&gt; Inapplicable&gt; Inapprehensible&gt; Inappropriate&gt; Inaptitude</p>
<p>1. Inappropriate</p>
<p>2. Inappeasable</p>
<p>3. Inaptitude</p>
<p>4. Inapplicable</p>
<p>2. Arrange the following words as per the order in dictionary.</p>
<p>1. Venomous</p>
<p>2. Ventrose</p>
<p>3. Veneration</p>
<p>4. Vengeance</p>
<p>5. Ventilation</p>
<p>(1) 43521</p>
<p>(2) 31425</p>
<p>(3) 43251</p>
<p>(4) 34152</p>
<p>Answer key: 4</p><p>This is the correct form as per the order in dictionary.</p><p>Veneration&gt; Vengeance&gt; Venomous&gt; Ventilation&gt; Ventrose</p>

the output file must be like this....

If there is some lines which is the part of the question and followed by number and '.' as given above should be the part of the question itself as shown below..

<p>1. Arrange the following words as per the order in dictionary.</p><p>1. Inappropriate</p><p>2. Inappeasable</p><p>3. Inaptitude</p><p>4. Inapplicable</p><p>5. Inapprehensible</p>
<p>(1) 25431</p>
<p>(2) 13425</p>
<p>(3) 24513</p>
<p>(4) 52341</p>
<p>Answer key: 3</p><p>This is the correct form as per the order in dictionary.</p><p>Inappeasable&gt; Inapplicable&gt; Inapprehensible&gt; Inappropriate&gt; Inaptitude</p><p>1. Inappropriate</p><p>2. nappeasable</p><p>3. Inaptitude</p><p>4. Inapplicable</p>
<p>2. Arrange the following words as per the order in dictionary.</p><p>1. Venomous</p><p>2. Ventrose</p><p>3. Veneration</p><p>4. Vengeance</p><p>5. Ventilation</p>
<p>(1) 43521</p>
<p>(2) 31425</p>
<p>(3) 43251</p>
<p>(4) 34152</p>
<p>Answer key: 4</p><p>This is the correct form as per the order in dictionary.</p><p>Veneration&gt; Vengeance&gt; Venomous&gt; Ventilation&gt; Ventrose</p>

Please help me..

rofelia09
  • 51
  • 11

1 Answers1

0

Finally I am able to write the code for the question I have asked. the code is given below..

import sys
import re
answerstart = re.compile('<p>\([12345]*\) ')
uestionstart = re.compile('^<p>[0123456789]*\. ')
data = open(sys.argv[1]).readlines()
allq = []
flag = False
for  line in reversed(data):
   if answerstart.match(line):
      if '<p>(1) ' in line:
        allq.append('\n'+line)
      else:
        allq.append(line)
   elif line.startswith('<p>Answer '):
      flag = False
      allq.append('\n'+line.rstrip())
   elif questionstart.match(line):
      if '<p>1. ' in line:
         allq.append(line.rstrip())
         flag =  True
      else:
         if flag == True:
            allq.append('\n'+line.rstrip())
         else:
            allq.append(line.rstrip())
            flag = False
   else:
      allq.append(line)
new = []
for line in reversed(allq):
   new.append(line)
print "".join(new)

But there is a problem in this code is that this will work only for the format what I have post in the question above.

rofelia09
  • 51
  • 11