Gather your text data into a flat array of individual rows and cells. Transpose this, so everything per column is gathered into a row. Create an array containing the length of the longest item per (originally) column. Use this data to space out each cell, while printing rows. In code:
from bs4 import BeautifulSoup
content = '''
<table class="gridtable">
<tbody>
<tr>
<th>Store #</th><th>City Name</th><th>Orders</th></tr>
<tr><td>1</td><td style="text-align:left">Phoenix</td><td>70</td></tr>
<tr><td>2</td><td style="text-align:left">Columbus</td><td>74</td></tr>
<tr><td>3</td><td style="text-align:left">New York</td><td>112</td></tr>
<tr><td></td><td>TOTAL</td><td>256</td></tr></tbody>
</table>
'''
def print_table_nice(table):
cells = [[cell.text for cell in row.find_all(['td','th'])] for row in table.find_all('tr')]
transposed = list(map(list, zip(*cells)))
widths = [str(max([len(str(item)) for item in items])) for items in transposed]
for row in cells:
print (' '.join(("{:"+width+"s}").format(item) for width,item in zip(widths,row)))
soup = BeautifulSoup(content, 'html.parser')
tables = soup.find_all('table')
table = tables[0]
print_table_nice(table)
Result:
Store # City Name Orders
1 Phoenix 70
2 Columbus 74
3 New York 112
TOTAL 256
which seems about as elegant as you can do on a console. (To add vertical lines, just join the rows with a |
instead of a space.)
I inlined the table data because I don't have access to your Page.html
, but getting access to the table data does not seem to be the problem here.
Oh let's add lines all around. Just because I can:
def print_table_nice(table):
header = [cell.text for cell in table.select('tr th')]
cells = [[cell.text for cell in row.select('td')] for row in table.select('tr') if row.select('td')]
table = [header]+cells
transposed = list(map(list, zip(*table)))
widths = [str(max([len(str(item)) for item in items])) for items in transposed]
print ('+'+('-+-'.join('-'*int(width) for width in widths))+'+')
print ('|'+(' | '.join(("{:"+width+"s}").format(item) for width,item in zip(widths,header)))+'|')
print ('+'+('-+-'.join('-'*int(width) for width in widths))+'+')
for row in cells:
print ('|'+(' | '.join(("{:"+width+"s}").format(item) for width,item in zip(widths,row)))+'|')
print ('+'+('-+-'.join('-'*int(width) for width in widths))+'+')
It turned out to be an interesting complication because this requires the th
to be separated from the td
rows. Won't work as-is for multi-line rows, though. Result, then, is:
+--------+-----------+-------+
|Store # | City Name | Orders|
+--------+-----------+-------+
|1 | Phoenix | 70 |
|2 | Columbus | 74 |
|3 | New York | 112 |
| | TOTAL | 256 |
+--------+-----------+-------+