I am working in collecting a data set that crossreferences a track's audio features and the Billboard's chart data set available on Kaggle. I am trying to get each song's URI in order to then get its audio features, and I defined the following function:
def get_track_uri(track_title, sp):
result = sp.search(track_title, type="track", limit=1)
if result['tracks']['total'] > 0:
track_uri = result['tracks']['items'][0]['uri']
return track_uri
else:
return None
and then it goes through the Billboard's 'song' column in order to create a new column with the URIs.
cleandf['uri'] = cleandf['song'].apply(lambda x: get_track_uri(x, sp))
So, I left it running for about 40 min and I noticed that it got stuck in a sleep method from Spotipy which I gathered was because I was making a lot of requests to the Spotify API. How can I be able to go around this if I'm trying to go through 50,000 rows? I could maybe make it wait between search queries but it will easily take what, 15 hours? Also, there probably is a way to directly get the audio features without me getting the URI's, but it still would need to go through all of the rows.