0

Background & Problem

I am trying to web scrape links to articles from a news webpage. I've done a nested find_all and I've managed to get the 'a href' sections, but this also includes info I don't require like article name.

What I need Help with

I've searched several articles on SO such as this. But none seem to work for my specific case. Does Anyone know how I can create a list of just news article links?

My code so far

import requests
from bs4 import BeautifulSoup
import pandas as pd

# *******************************
# CREATE CSV FILE
# *******************************
filename = "NEWS.csv"
f = open(filename, "w", encoding='utf-8')
headers = "Statement,Link,Date, Source, Label\n"
f.write(headers)

# *******************************
# CONNECT TO URL
# *******************************
# headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'}
headers = {'User-Agent': 'Mozilla/5.0'}
url = 'https://reneweconomy.com.au/archives/?pg1=1'
print(url)

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')

# *******************************
# FIND ARTICLES
# *******************************
frame = []
links = [articles.find_all('a') for articles in soup.find_all("div", attrs={'class': 'archive-info'})]
print(links)

Current Output

[[<a href="https://reneweconomy.com.au/japan-to-use-ammonia-at-coal-plant-in-boost-for-australias-biggest-wind-and-solar-project/"><h3>Japan to use ammonia at coal plant in boost for Australia’s biggest wind and solar project</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/japan-to-use-ammonia-at-coal-plant-in-boost-for-australias-biggest-wind-and-solar-project/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/we-need-grid-ready-for-100-pct-renewables-now-not-in-a-few-decades-aemo/"><h3>We need grid ready for 100 pct renewables now, not in a few decades: AEMO</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/we-need-grid-ready-for-100-pct-renewables-now-not-in-a-few-decades-aemo/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/reneweconomy-unveils-big-battery-storage-map-of-australia/"><h3>RenewEconomy unveils Big Battery Storage Map of Australia</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/reneweconomy-unveils-big-battery-storage-map-of-australia/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/labor-tries-to-block-new-regulations-pushing-arena-into-fossil-fuels/"><h3>Labor tries to block new regulations pushing ARENA into fossil fuels</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/labor-tries-to-block-new-regulations-pushing-arena-into-fossil-fuels/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/huge-10gw-of-offshore-wind-capacity-near-iceland-to-help-power-uk/"><h3>Huge 10GW of offshore wind capacity near Iceland to help power UK</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/huge-10gw-of-offshore-wind-capacity-near-iceland-to-help-power-uk/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/gas-industry-wants-dedicated-renewable-gas-target-to-support-hydrogen/"><h3>Gas industry wants dedicated “renewable gas target” to support hydrogen</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/gas-industry-wants-dedicated-renewable-gas-target-to-support-hydrogen/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/us-battery-surge-unlocks-record-growth-in-wind-and-solar-pipeline/"><h3>US battery surge unlocks record growth in wind and solar pipeline</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/us-battery-surge-unlocks-record-growth-in-wind-and-solar-pipeline/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/the-curious-case-of-tomago-fake-blackouts-feeding-a-fossil-fuelled-future/"><h3>The curious case of Tomago: fake blackouts feeding a fossil fuelled future</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/the-curious-case-of-tomago-fake-blackouts-feeding-a-fossil-fuelled-future/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/shell-and-edify-in-landmark-big-battery-storage-deal-in-nsw/"><h3>Shell and Edify in landmark big battery storage deal in NSW</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/shell-and-edify-in-landmark-big-battery-storage-deal-in-nsw/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/nsw-smart-meter-program-to-soak-up-solar-with-everyday-batteries/"><h3>NSW smart meter program to soak up solar with “everyday batteries”</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/nsw-smart-meter-program-to-soak-up-solar-with-everyday-batteries/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/energy-retailer-asks-accc-to-probe-possible-market-gaming-in-nsw/"><h3>Energy retailer asks ACCC to probe possible “market gaming” in NSW</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/energy-retailer-asks-accc-to-probe-possible-market-gaming-in-nsw/#disqus_thread">0</a>], [<a href="https://reneweconomy.com.au/it-is-crazy-greens-push-new-climate-laws-after-coalitions-fossil-fuel-subsidy-spree/"><h3>“It is crazy:” Greens push new climate laws after Coalition’s fossil fuel subsidy spree</h3></a>, <a class="comment-link" href="https://reneweconomy.com.au/it-is-crazy-greens-push-new-climate-laws-after-coalitions-fossil-fuel-subsidy-spree/#disqus_thread">0</a>]]
Bobby Heyer
  • 531
  • 5
  • 18

1 Answers1

1

Try this,

links = [articles.find_all('a')[0]['href'] for articles in soup.find_all("div", attrs={'class': 'archive-info'})]
print(links)
Nanthakumar J J
  • 860
  • 1
  • 7
  • 22