How do parse an HTML correlation matrix with Python and Pandas?

Question

I've gotten as far as accessing the table and bringing the information into python but I'm unable to iterate through the column names and row names to populate the table with the correct correlation values. How do I iterate through the first row to pull the th col header values?

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

req = Request('https://www.mrci.com/special/corr030.php',headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req, context = ctx).read()

soup = BeautifulSoup(webpage, 'lxml') # Parse the HTML as a string
table = soup.find_all('table')[2] # Grab the first table
for row in table.find_all('tr')[1:]:
    print(row)

Does anyone have any insight on how I can pull the entire table into a pandas dataframe?

Possible duplicate of [Parsing HTML Tables with Python](https://stackoverflow.com/questions/48610652/parsing-html-tables-with-python) — Stop harming Monica, Feb 04 '18 at 17:20

score 0 · Answer 1 · answered Feb 04 '18 at 02:21

0

you don't show your ideal result, so by my guess, maybe the code will help you.

for row in table.find_all('tr')[1:]:
    for i in row.descendants:
        print(i)

answered Feb 04 '18 at 02:21

Yang MingHui

380
4
14

How do parse an HTML correlation matrix with Python and Pandas?

1 Answers1