1

I'm trying to retrieve discographies for various artists. Wikipedia and the manual web interface for MusicBrainz.org seem to agree on what albums make this up, for the artists I've checked. My first thought was to attempt to screen-scrape either of these resources, but that looks like hard work to do it properly.

Direct queries of the musicbrainz data seemed to offer a quicker way to get clean data. I would ideally construct a request like this ...

data = get_release_groups(artist=mbid,
                          primary_type='Album',
                          status='Official',
                          includes=['first_release_date',
                                    'title',
                                    'secondary_type_list'])

I chose to use the python wrapper musicbrainsngs, as I am fairly experienced with python. It gave me a choice of three methods, get_, search_ and browse_. Get_ will not return sufficient records. Browse_ appeared to be what I wanted, so I tried that first, especially as search_ was documented around looking for text in the python examples, rather than the mb_id, which I already had.

When I did a browse_release_groups(artist=artist_id,,,), I got a list of release groups, each containing the data I wanted, which was album title, type and year. However, I also got a large number of other release groups that don't appear on their manual web results for (for example The Rolling Stones) https://musicbrainz.org/artist/b071f9fa-14b0-4217-8e97-eb41da73f598

There didn't appear to be any way to filter in the query for status='official', or to include the status as part of the results so I could manually filter.

In response to this question, Wieland has suggested I use the search_ query. I have tested search_release_groups(arid=mbid, status='official', primarytype='Album', strict=True, limit=...) and this returns many fewer release groups. As far as studio albums are concerned, it matches 1:1. There are still a few minor discrepancies in the compilations, which I can live with. However, this query did not return the first-release-date, and so far, it has been resistant to my attempts to find how to include it. I notice in the server search code linked to that every query starts off manipulating rgm.first_release_date_year etc, but it's not clear how/when this gets returned from a query.

It's just occurred to me that I can use both a browse_ and a search_ , as together they give me all the information. So I have a work around, but it feels rather agricultural.

TL;DR I want release groups (titles, dates, types, status) by artist ID. If I browse, I get dates, but can't include or filter by status. If I search, I can filter by status, but don't get dates. How can I get both in one query?

Neil_UK
  • 1,043
  • 12
  • 25

1 Answers1

1

I'm not entirely sure what your question is, but the find_by_artist method of release groups (source here) is what's doing the filtering of release groups for the artist pages, in particular:

     # Show only RGs with official releases by default, plus all-status-less ones so people fix the status
    unless ($show_all) {
    push @$conditions, "(EXISTS (SELECT 1 FROM release where release.release_group = rg.id AND release.status = '1') OR
                        NOT EXISTS (SELECT 1 FROM release where release.release_group = rg.id AND release.status IS NOT NULL))";
    }

Unfortunately, I think it's not possible to express that condition in a normal web service call. You can, however, use the search web service to filter for release groups by the rolling stones that contain at least one "official" release: http://musicbrainz.org/ws/2/release-group/?query=arid:b071f9fa-14b0-4217-8e97-eb41da73f598%20AND%20status:official&offset=0. In python-musicbrainzngs, the call for this is

search_release_groups(arid="b071f9fa-14b0-4217-8e97-eb41da73f598", status="official", strict=True)

Unfortunately, the search results don't include the first-release-date field. There's an open ticket about it, but it's not going to be fixed in the near future.

Wieland
  • 1,663
  • 14
  • 23
  • thanks for the search suggestion. I've tried it, and updated my OP accordingly. The python doc is appalling, I wonder if it would be better to just hack the xml? – Neil_UK Feb 02 '17 at 14:10
  • I've updated my answer. If you absolutely need the first release date, using browse requests and filtering on your own is probably the way to go. – Wieland Feb 02 '17 at 20:57
  • Thanks for your help. I'm glad there's an open ticket on it and other people have this issue. Taking my ideal query, I guess what I'm asking for is a general DB query to be built and executed as a result of parameters in the get_ request. I don't think this would be unsafe, if restricted to readonly, and of course only operating within the universe of the data tables. Allowing all includes=[,,] in requests would get most of the way there. But the work around you helped me find is only two queries, and still much easier than screen scraping. Thanks again. – Neil_UK Feb 03 '17 at 06:05