-5

I would like to create a web scraping with some Python library (Beautiful Soup, for example) to collect the YouTube links on this page:

https://www.last.fm/tag/rock/tracks

Basically, I want to download the title of the song, the name of the artist and the link to Youtube. Can anyone help me with some code?

  • Scraping the Last.fm website is against the [Last.fm API ToS](https://www.last.fm/api/tos), which you agreed to if you've ever created a Last.fm API key – Thom May 26 '21 at 18:42

2 Answers2

1

Here's how you can do it:

from bs4 import BeautifulSoup
import requests

url = 'https://www.last.fm/tag/rock/tracks'

headers = {
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B179 Safari/7534.48.3"
}

links = []

response = requests.get(url, headers=headers)

response = requests.get(url, headers = headers)
soup = BeautifulSoup(response.content, 'html.parser')
soup.encode('utf-8')

urls = soup.find_all(class_ = 'chartlist-name')

for url in urls:
    relative_link = url.find('a')['href']
    link = 'https://www.last.fm/' + relative_link
    links.append(link)
print(links)

With the fuction soup.find_all you find all the tag with the class: "chartlist-name".

The for loop is used to remove the html tags and to append the links in the "links" list

Fabix
  • 321
  • 1
  • 2
  • 17
0

In the future, provide some code to show what you have attempted.

I have expanded on Fabix answer. The following code gets the Youtube link, song name, and artist for all 20 pages on the source website.

from bs4 import BeautifulSoup
import requests

master_url = 'https://www.last.fm/tag/rock/tracks?page={}'

headers = {
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B179 Safari/7534.48.3"
}

for i in range(1,20):
    response = requests.get(master_url.format(i), headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')

    chart_items = soup.find_all(class_='chartlist-row')

    for chart_item in chart_items:
        youtube_link = chart_item.find('a')['href']
        artist = chart_item.find('td', {'class':'chartlist-artist'}).find('a').text
        song_name = chart_item.find('td', {'class': 'chartlist-name'}).find('a').text
        print('{}, {}, {}'.format(song_name, artist, youtube_link))
Rusty Robot
  • 1,725
  • 2
  • 13
  • 29