0

I am trying to scrape the New Hampshire Secretary of State's website on registered voters. So far I have been able to get the text of the website in Beautiful soup with the following code:

import pandas as pd
from selenium import webdriver 
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from openpyxl import Workbook 
import getpass
from urllib.request import urlopen 
from bs4 import BeautifulSoup

url = urlopen('http://sos.nh.gov/NamesHistory.aspx')
html = BeautifulSoup(url, 'html.parser')
html.find('table', attrs={'class':'table-border2-black'}).get_text

However, my question is how would I be able to get the text from this table into a usable data frame like the one that appears on the website(http://sos.nh.gov/NamesHistory.aspx)? My question is different because this website is different from previous websites.

Joe
  • 41,484
  • 20
  • 104
  • 125
  • Possible duplicate of [python BeautifulSoup parsing table](https://stackoverflow.com/questions/23377533/python-beautifulsoup-parsing-table) – eggcelent Jul 20 '18 at 01:19
  • I can see how it is similar, but I would like help in getting this into a usable dataframe and this website is different from that other website. –  Jul 20 '18 at 01:27

1 Answers1

0

You need to convert the scrapped data using csv files using the following commands,

import csv

with open ('filename.cv','wb') as file:
   writer=csv.writer(file)
   for row in course_list:
      writer.writerow(row)

you can see it here too writing and saving CSV file from scraping data using python and Beautifulsoup4.

After that you need to access the csv file and convert the data into dataframes for further processing. If you don't know how to do that, read pandas document, start here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_csv.html

Zev
  • 3,423
  • 1
  • 20
  • 41
K. Aslam
  • 29
  • 5
  • Could you post an example of this using the website that I posted? I looked at the other example and this is not a typical website. And, I am not sure how to get the text isolated from the other characteristics of the html code. –  Jul 20 '18 at 01:49