-3

I want to split an html file into 2 parts (outfile1.html and outfile2.html). The first should reach a first line containing <td>II</td> (the variable 'linea' should indicate the line number) followed by another line with <td>LARGE (XL)</td>. After a modification of outfile2, I must join the 2 output files.

File html

infilehtml ='''
</thead>
<tbody>
<tr>
<td>III</td>
<td>LARGE (L)</td>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
**<td>II</td>
<td>LARGE (XL)</td>**
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>'''

My code is:

from bs4 import BeautifulSoup
from os import system

linea = 0
file1 =’start’
string1='<td>II</td>'
string2='<td>LARGE (XL)</td>'
with open(infilehtml,'r') as html_infile, open('outfile1.html', 'a') as html_outfile1, open('outfile1.html', 'a') as html_outfile2:
    soup_in = BeautifulSoup(html_infile, 'html.parser')
    while(True):
        line = soup_in.readline()
        if file1 ='start':
            linea += 1
            html_outfile1.write(line.strip() + '\n')
            if line == string1:
                line = soup_in.readline()
                if line == string2:
                   html_outfile2.write(line.strip() + '\n')
                   file1 =’end’
      else:
            html_outfile2.write(line.strip() + '\n')

html_infile.close()  
html_outfile1.close()
html_outfile2.close()
system("cat html_outfile1 html_outfile2 > outfile.html")

Thank you very much for your help

  • Not quite sure what you want as your output. It's not really clear what you are wanting to do. But have you considered using pandas to maniputae the tables, then output them from a dataframe to html? – chitown88 Oct 10 '22 at 10:29
  • Please explain what you are trying to achieve and also what issues you have faced with what you've done so far – Driftr95 Oct 10 '22 at 19:02
  • Thank you for your comments. I create the html file with text and add tables using Pandas. However I want to highlight various rows and cells by putting code to change the color or type of text in specific cells. I use df.to_html to create generic tables, and then I looking for a specific position and chage color or font style. I do not know how to add code to change for example the Style in a specific part of the table. So I modify specific cells o complete lines in finished tables. – Juan Escos Oct 11 '22 at 20:37
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Oct 19 '22 at 07:32

1 Answers1

0

I have changed the infilehtml by the truth html table file:

<table border='1' align="left">
<tr><th>Cat</th><th>size</th><th>Value1</th><th>Value2</th><th>Value3</th><th>Value4</th></tr>
<tr><td>III</td><td>LARGE (L)</td><td>0</td><td>1</td><td>2</td><td>3</td></tr>
<tr><td>II</td><td>LARGE (XL)</td><td>0</td><td>1</td><td>2</td><td>3</td></tr>
</table>

I wanted to split this html file into 2 parts (outfile1.html and outfile2.html). The first should reach a first line containing II (the variable 'linea' should indicate the line number) followed by another line with LARGE (XL). After a modification of outfile2, I must join the 2 output files. In this case <td>II</td> and <td>LARGE (XL)</td> are in the same line.

My code:

import shutil

# Variables information
pathData = './Data/'
pathResults = './Results/'
linea = 0
found = 0
file1 ='start'
string1='<td>II</td><td>LARGE (XL)</td>'

# Open files, to read the Input file and to write to two new files. Dealing files as txt files
with open(pathData +'infilehtml.html','r') as infileHTMLtext, open (pathResults + 'outfile1.html', 'w+') as f1, open(pathResults + 'outfile2.html', 'w') as f2:
    # idex is the variable indicating the numbre of the line
     for idx, row in enumerate(infileHTMLtext):
         print('Line: ' + str(idx) + '  Content: ' + row)
         if file1 == 'start': # Allow enter data into the first file
             f1.writelines(row)
             if string1 in row:
                found += 1
                linea = idx
                print("%3d %4d  string1=%s" % (found, linea, row))
                file1 ='end'
         if file1 == 'end': # Allow enter data into the second file when string1 is reached
             f2.writelines(row)
infileHTMLtext.close()  
f1.close()
f2.close()

# remove last line from the first file outfile1. Last line is also into the second file outfile2
fd=open(pathResults + 'outfile1.html',"r")
d=fd.read()
fd.close()
m=d.split("\n")
s="\n".join(m[:-2])
fd=open(pathResults + 'outfile1.html',"w+")
for i in range(len(s)):
    fd.write(s[i])
fd.close()

# The modification for the final file. Is between outfile1 and outfile2
file = open(pathResults + 'outfilenew.html', 'w')
file.write('\n<tr style="text-align: right; color: #fff; background-color: #f00;"></tr>\n')
file.write('<tr>')
for i in range(6):
    file.write('<td></td>')
file.write('</tr>\n')
file.close()

# joining files to final file output_file
with open(pathResults + 'output_file.html','wb') as wfd:
    for f in [pathResults + 'outfile1.html', pathResults + 'outfilenew.html', pathResults + 'outfile2.html']:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)