I want to split an html file into 2 parts (outfile1.html and outfile2.html). The first should reach a first line containing <td>II</td>
(the variable 'linea' should indicate the line number) followed by another line with <td>LARGE (XL)</td>
. After a modification of outfile2, I must join the 2 output files.
File html
infilehtml ='''
</thead>
<tbody>
<tr>
<td>III</td>
<td>LARGE (L)</td>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
**<td>II</td>
<td>LARGE (XL)</td>**
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>'''
My code is:
from bs4 import BeautifulSoup
from os import system
linea = 0
file1 =’start’
string1='<td>II</td>'
string2='<td>LARGE (XL)</td>'
with open(infilehtml,'r') as html_infile, open('outfile1.html', 'a') as html_outfile1, open('outfile1.html', 'a') as html_outfile2:
soup_in = BeautifulSoup(html_infile, 'html.parser')
while(True):
line = soup_in.readline()
if file1 ='start':
linea += 1
html_outfile1.write(line.strip() + '\n')
if line == string1:
line = soup_in.readline()
if line == string2:
html_outfile2.write(line.strip() + '\n')
file1 =’end’
else:
html_outfile2.write(line.strip() + '\n')
html_infile.close()
html_outfile1.close()
html_outfile2.close()
system("cat html_outfile1 html_outfile2 > outfile.html")
Thank you very much for your help