0

I am trying to extract all my scrobbles from the LastFM API ('method': 'user.getrecenttracks'), using Python.

I have been able to extract the raw data but am struggling when processing the data in DataFrames. Most fields come back with a lot of extra ID tags, which I need to strip. Example

BEFORE: More Than Ever People - Late Night Mix by {'mbid': '', '#text': 'Levitation'} from album: {'mbid': '3240770e-8cbd-49c3-a070-dc92b4ffb8fe', '#text': 'Essential Levitation - 20 years of Ibiza Chillout Music'} {'uts': '1590297990', '#text': '30 May 2020, 10:10'}

AFTER: More Than Ever People - Late Night Mix by Levitation from album: Essential Levitation - 20 years of Ibiza Chillout Music

The data is structured as follows:

Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   artist  501 non-null    object
 1   album   501 non-null    object
 2   name    501 non-null    object
 3   date    500 non-null    object

Stripping goes fine for all fields but one, the 'date field'. Basically I use the indexes row['index1']['index2'] which works fine, except for the date field. All fields more less are structured the same way, see as per Last FM API.

So addressing row['album']['#text'] works fine where as row['date'] = row['date']['#text'] errors out with "TypeError: string indices must be integers".

See code (the commented out code is the bit I am struggling with.):

for index, row in df_track_list.iterrows():
    print ("pre:", row['name'], "by", row['artist'], "from album:", row['album'], row['date'])
    row['artist'] = row['artist']['#text']
    row['album'] = row['album']['#text']
    #row['date'] = row['date']['#text']
    #print(row['date']['#text'])
    print ("post:", row['name'], "by", row['artist'], "from album:", row['album'])

What is happening here? Any ideas? Or anybody with working examples?

Real_4W
  • 1
  • 1

1 Answers1

0

You likely need to drop all "now playing" entries. Most LastFM profiles will display the track the user is currently listening to under "recent tracks" unless disabled in the profile settings (under "privacy"). This information is also included on every page of the user.getRecentTracks API response. Of course, the "now playing" entries have no date information.

This would explain why you have 501 entries in all columns but only 500 for date. There is an additional @attr key indicating "now playing" in the API response (it is not present when the user is not currently scrobbling).

# python_version >= 3.6
import pandas, requests
user = 'your_username'
apiKey = 'your_api_key'
params = {'method':'user.getRecentTracks', 'user':user, 'api_key':apiKey, 'format':'json'}
response = requests.get('http://ws.audioscrobbler.com/2.0/', params=params).json()
df_track_list = pandas.DataFrame(response.get('recenttracks').get('track'))
if '@attr' in df_track_list.columns:
    boolFilter = (df_track_list['@attr'] != {'nowplaying': 'true'})
    df_track_list = df_track_list[boolFilter].reset_index(drop=True)