HTML parser to csv, no results in output folder

Question

I used BeautifulSoup to scrape a website to save to a csv. When I open up the csv, there is only the header, title, and no data (the links that I scraped).

I already tried "lxml" so i switched to html.parser.

from bs4 import BeautifulSoup  
import requests
import csv

page = requests.get('https://www.census.gov/programs-surveys/popest.html')
raw_html = page.text    # declare the raw_html var

soup = BeautifulSoup(raw_html, 'html.parser')  # parse the html

T = [["US Census Bureau Links"]] #Title
I = page.text

for link in soup.find_all('a', href=True):
    print(link['href'])

with open("US_Census_Bureau_links.csv","w",newline="") as f:    
    cw=csv.writer(f)                          
    cw.writerows(T)                             
    cw.writerows(I)                             

f.close()

I get 8 pages full of links when I run it. but no links in the output csv.

Jack Fleeting · Answer 1 · 2019-09-24T00:50:49.973

1

You write to your csv file you

cw.writerows(T)

But

T = [["US Census Bureau Links"]] #Title

and it's empty when you declare it, and stay empty when you write it, because you don't append any links to it. So change your for loop to: [Edited as per @vladwoguer's comment below}

for link in soup.find_all('a', href=True):
     T.append([link['href']])

and it should work.

edited Sep 24 '19 at 00:50

answered Sep 24 '19 at 00:00

Jack Fleeting

24,385
6
23
45

I think to it work properly it has to be ```T.append([link['href']])``` the reason is https://stackoverflow.com/a/27065792/4325878 – vladwoguer Sep 24 '19 at 00:40
@vladwoguer - You are correct! Thanks. Edited the question. – Jack Fleeting Sep 24 '19 at 00:42

vladwoguer · Accepted Answer · 2019-09-24T00:36:15.297

You can extract the links to a collection then record it on a file:

from bs4 import BeautifulSoup  
import requests
import csv

page = requests.get('https://www.census.gov/programs-surveys/popest.html')
raw_html = page.text    # declare the raw_html var

soup = BeautifulSoup(raw_html, 'html.parser')  # parse the html

T = [["US Census Bureau Links"]] #Title

links = map(lambda link: link['href'], soup.find_all('a', href=True)) # links

with open("US_Census_Bureau_links.csv","w",newline="") as f:    
    cw=csv.writer(f, quoting=csv.QUOTE_ALL)                          
    cw.writerows(T)                             
    cw.writerow(links)                         

f.close()

If you want one link on each line:

with open("US_Census_Bureau_links.csv","w",newline="") as f:    
    cw=csv.writer(f, quoting=csv.QUOTE_ALL)                          
    cw.writerows(T)
    for link in links:                      
      cw.writerow([link])

HTML parser to csv, no results in output folder

2 Answers2