2

I am working in collecting a data set that crossreferences a track's audio features and the Billboard's chart data set available on Kaggle. I am trying to get each song's URI in order to then get its audio features, and I defined the following function:

def get_track_uri(track_title, sp):
    result = sp.search(track_title, type="track", limit=1)
    if result['tracks']['total'] > 0:
        track_uri = result['tracks']['items'][0]['uri']
        return track_uri
    else:
        return None

and then it goes through the Billboard's 'song' column in order to create a new column with the URIs.

cleandf['uri'] = cleandf['song'].apply(lambda x: get_track_uri(x, sp))

So, I left it running for about 40 min and I noticed that it got stuck in a sleep method from Spotipy which I gathered was because I was making a lot of requests to the Spotify API. How can I be able to go around this if I'm trying to go through 50,000 rows? I could maybe make it wait between search queries but it will easily take what, 15 hours? Also, there probably is a way to directly get the audio features without me getting the URI's, but it still would need to go through all of the rows.

Mario A
  • 55
  • 3
  • A rate limit exists, so you have to modify your code in a way to deal with it. I thought I had found a solution, but that stoped working last week (or I didn't hit a rate limit previously). – Ximzend Feb 14 '23 at 17:09

0 Answers0