-1

I'm using OpenRefine to pull in information on publisher policies using the Sherpa Romeo API (Sherpa Romeo is a site that aggregates publisher policies). I've got that.

Now I need to parse the returned JSON so that those with certain pieces of information remain. The results I'm interested in need to include the following:

'any_website',

'any_repository',

'institutional_repository',

'non_commercial_institutional_repository',

'non_commercial_repository'

These pieces on information all fall under an array called "permitted_oa". For some reason, I can't even work out how to just pull out that array. I've tried writing grel expressions such as

value.parseJson().items.permitted_oa

but it never reutrns anything.

I wish I could share the JSON but it's too big.

  • Welcome to StackOverflow (and OpenRefine). You haven't provided enough information for people to help you easily. Two things which could help are: 1) include the first entry or couple of entries returned by the API or 2) post the entire thing, or a good chunk of it, on a public pasteboard site e.g. Github gists or pastebin – Tom Morris Mar 16 '21 at 06:42
  • Hi, thank you for the advice! Here is a Pastebin link: https://pastebin.com/zCThyKtC -- I will try to edit the main text to give more information – fionaglasgow Mar 16 '21 at 08:30

1 Answers1

1

I can see a couple of issues here.

Firstly the Sherpa API response items is an array (i.e. a list of things). When you have an array in the JSON, you either have to select a particular item from the array, or you have to explicitly work through the list of things in the array (aka iterate across the array) in your GREL. If you've previously worked with arrays in GREL you'll be familiar with this, but if you haven't

  • value.parseJson().items[0] -> first item in the array
  • value.parseJson().items[1] -> second item in the array
  • value.parseJson().items[2] -> third item in the array etc. etc.

If you know there is only ever going to be a single item in the array then you can safely use value.parseJson().items[0]

However, if you don't know how many items will be in the array and you are interested in them all, you will have to iterate over the array using a GREL control such as "forEach":

forEach(value.parseJson().items, v, v)

is a way of iterating over the array - each time the GREL finds an item in the array, it will assign it to a variable "v" and then you can do a further operation on that value using "v" as you would usually use "value" (see https://docs.openrefine.org/manual/grel#foreache1-v-e2 for an example of using forEach on an array)

Another possibility is to use join on the array. This will join all the things in an array into a string.

value.parseJson().items.join("|")

It looks like the Sherpa JSON uses Arrays liberally so you may find more arrays you have to deal with to get to the values you want.

Secondly, in the JSON you pasted "oa_permitted" isn't directly in the "item" but in another array called "publisher_policy" - so you'll need to navigate that as well. So:

value.parseJson().items[0].publisher_policy[0].permitted_oa[0]

would get you the first permitted_oa object in the first publisher_policy in the first item in the items array. If you wanted to (for example) get a list of locations from the JSON you have pasted you could use:

value.parseJson().items[0].publisher_policy[0].permitted_oa[0].location.location.join("|")

Which will give you a pipe ("|") separated list of locations based on the assumption there is only a single item, single publisher_policy and singe permitted_oa - which is true in the case of the JSON you've supplied here (but might not always be true)

Owen Stephens
  • 1,550
  • 1
  • 8
  • 10
  • Thank you - that is incredible! I will try it soon and report back. I'm still quite new to both APIs and coding, so this is super helpful. – fionaglasgow Mar 18 '21 at 09:34