2

I am trying to scrape a website and then save the links to a text file. in the text file, I would like to delete any line that does not start with "/". How could I do that? This is everything I have so far:

import requests
from bs4 import BeautifulSoup
page = requests.get("https://wiki.stardewvalley.net/Stardew_Valley_Wiki")
soup = BeautifulSoup(page.content, 'html.parser')

wikilinks = []
for con in soup.find_all('div', class_="mainmenuwrapper"):
    for links in soup.find_all('a', href=True):
        if links.text:
            wikilinks.append(links['href'])

# print(wikilinks)


with open('./scrapeNews/output.txt', 'w') as f:
    for item in wikilinks:
        f.write("%s\n" % item)
MendelG
  • 14,885
  • 4
  • 25
  • 52

1 Answers1

3

You can use the built-in startswith() method to check if a link startswith a "/". However, since there is also other information besides links, you can filter to only write links that start with "http", instead of just filtering for "/".

...
with open("./scrapeNews/output.txt", "w") as f:
    for item in wikilinks:
        if not str(item).startswith("http"):
            continue
        f.write("%s\n" % item)
MendelG
  • 14,885
  • 4
  • 25
  • 52