0

I am trying to retrieve the whole list "Field of Study" from Microsoft Academic Graph for the level 0 FieldOfStudy "Computer Science". So far, I have the following curl code to retrieve fields in general:

curl -X POST \
  https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/x-www-form-urlencoded' \
  -H 'Ocp-Apim-Subscription-Key: my_subscription_key' \
  -d 'expr=Ty%3D'\''6'\''&attributes=Id%2CFL%2CFN%2CFC.FN%2CFP.FN%2CFC.FId%2CFP.FId'

This does not throw any error, but it needs further amendments so that it retrieves:

  1. All sub-fields of studies (children, grandchildren, etc.) of the field of study "Computer Science".
  2. Not being limited to the first 1000 field of studies (max limit of POST evaluate).

Although I am doing this in curl, I would also be open to a python approach in case it is a better choice.

spaceplane
  • 607
  • 1
  • 12
  • 27
user553417
  • 43
  • 1
  • 5

1 Answers1

0

If your goal is to enumerate all descendant fields of study under Computer Science, you'd need to make recursive calls as only the immediate levels are indexed for each field of study (meaning parents and children, not grandparents or grandchildren).

Fortunately this is a fairly trivial thing to accomplish using the query expression "Composite(FP.FId=parent_fos_id)".

Here is some sample C# code to get all descendant fields of study (apologies, I'm not Python savvy but it should be easy to figure out what I'm doing):

static void GetAllDescendantFieldsOfStudy(long fieldOfStudyId, int level, ref SortedSet<long> descendants)
{
    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", "_subscription_key_");

    var jsonString = 
        client
        .GetStringAsync(
            new Uri($"https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate?expr=Composite(FP.FId={fieldOfStudyId})&model=latest&count=1000&offset=0&attributes=Id,DFN"))
        .Result;

    var jsonObject = Newtonsoft.Json.Linq.JObject.Parse(jsonString);

    var childCount = jsonObject["entities"].Count();

    if (childCount > 0)
    {
        var children = jsonObject["entities"];

        foreach (var child in children)
        {
            var childId = child.Value<long>("Id");
            if (!descendants.Contains(childId))
            {
                descendants.Add(childId);

                Console.WriteLine($"{new String('\t', level)}Expanding {child.Value<string>("DFN")}");

                GetAllDescendantFieldsOfStudy(childId, level + 1, ref descendants);
            }
        }
    }
}

To use this, simply call it with the Computer Science ID, i.e.:

var descendants = new SortedSet<long>();

GetAllDescendantFieldsOfStudy(41008148, 0, ref descendants);

Unfortunately there is no way to get around the max result count of 1000. You'll simply need to break up your requests using offset.

Hope this helps!

Darrin Eide
  • 156
  • 2