1

Below is the code. I've also attached a photo of the table here.

The issue I am having is tr[1] has the column headers or th tags, then tr[1:40] has row headers with th tags followed by td and th tags that correspond to the numbers in the table. td are normal numbers, but th are different because they have a color scheme.

I want to iterate through tr[1] to set the colheaders, then set the th tags inside the tr tags from there on as the row headers. Any information would be much appreciated!

import ssl
from urllib2 import Request, urlopen
from bs4 import BeautifulSoup


ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

req = Request('https://www.mrci.com/special/corr030.php',headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req, context = ctx).read()

soup = BeautifulSoup(webpage, 'lxml') # Parse the HTML as a string
table = soup.find_all('table')[2] # Grab the correlation table
for row in table.find_all('tr')[1:]:
    print(row)
Martin Gergov
  • 1,556
  • 4
  • 20
  • 29
  • Please provide a minimum working example. No need for SSL and Request, they are not part of your problem. Just paste a multi line string of HTML code with a very small table similar to yours, like here: https://stackoverflow.com/questions/45026551/python-how-to-insert-a-line-in-a-multi-line-string – Joe Feb 04 '18 at 17:36
  • YMH18   99 98 94 69 -71 -80 -87 80 88 – Josh Pilson Feb 04 '18 at 18:21
  • I didn't have enough space to include an entire slice of the table. however, I can say that I've figured out how to create a pandas dataframe with the correct headers and number of rows. Now my issue is writing the code to find both 'th' and 'td' tags for each row/column intersection to fill the dataframe. for row in table.find_all('tr')[1:]: corr_tags = row.find_all('th') **I'm not sure how to add 'td' to this prompt as well so it pulls 'td' and 'th' tags consecutively as it iterates from position 1 to 40. – Josh Pilson Feb 04 '18 at 18:26
  • It is better to try this on a simple example, get it working there. From simple to complex. Take the code you pasted above and great a table with say three rows and four columns, maybe add attributes. Then get your code working with this table. Examine the output, try to figure out how the elements are parsed and where they end up. – Joe Feb 04 '18 at 19:15

0 Answers0