0

I used BeautifulSoup to scrape a website to save to a csv. When I open up the csv, there is only the header, title, and no data (the links that I scraped).

I already tried "lxml" so i switched to html.parser.

from bs4 import BeautifulSoup  
import requests
import csv

page = requests.get('https://www.census.gov/programs-surveys/popest.html')
raw_html = page.text    # declare the raw_html var

soup = BeautifulSoup(raw_html, 'html.parser')  # parse the html

T = [["US Census Bureau Links"]] #Title
I = page.text

for link in soup.find_all('a', href=True):
    print(link['href'])

with open("US_Census_Bureau_links.csv","w",newline="") as f:    
    cw=csv.writer(f)                          
    cw.writerows(T)                             
    cw.writerows(I)                             

f.close()                                      

I get 8 pages full of links when I run it. but no links in the output csv.

Mega_maha
  • 17
  • 4

2 Answers2

1

You write to your csv file you

cw.writerows(T)

But

T = [["US Census Bureau Links"]] #Title

and it's empty when you declare it, and stay empty when you write it, because you don't append any links to it. So change your for loop to: [Edited as per @vladwoguer's comment below}

for link in soup.find_all('a', href=True):
     T.append([link['href']])

and it should work.

Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
0

You can extract the links to a collection then record it on a file:

from bs4 import BeautifulSoup  
import requests
import csv

page = requests.get('https://www.census.gov/programs-surveys/popest.html')
raw_html = page.text    # declare the raw_html var

soup = BeautifulSoup(raw_html, 'html.parser')  # parse the html

T = [["US Census Bureau Links"]] #Title

links = map(lambda link: link['href'], soup.find_all('a', href=True)) # links

with open("US_Census_Bureau_links.csv","w",newline="") as f:    
    cw=csv.writer(f, quoting=csv.QUOTE_ALL)                          
    cw.writerows(T)                             
    cw.writerow(links)                         

f.close()                                      

If you want one link on each line:

with open("US_Census_Bureau_links.csv","w",newline="") as f:    
    cw=csv.writer(f, quoting=csv.QUOTE_ALL)                          
    cw.writerows(T)
    for link in links:                      
      cw.writerow([link])                 
vladwoguer
  • 951
  • 1
  • 14
  • 28