5

I have managed to retrieve some data from Statistics Sweden using the web site api. The answers to this question solved most of my problems.

How do I POST a JSON formatted request to GET JSON data from a URL in R into the data.frame in a less verbose manner?

But I still have two problems.

If I have characters with umlauts in my json-question (like “Å”, “Ä”, “Ö”) I get a “404” response from the server.

I’m trying to download data from this table:

Population 16+ years (RAMS) by region, employment, age and sex. Year 2004 - 2015

(You can get the query to the api on the web site if you click “Continue” and then “api for this table”, but you have to change the response format from "px" to "json".)

This code works:

library(jsonlite)
library(httr)

bodytxt <- '{
  "query": [
{
  "code": "Region",
  "selection": {
  "filter": "vs:RegionKommun07",
  "values": [
  "0114",
  "1280"
  ]
  }
},
  {
  "code": "Alder",
  "selection": {
  "filter": "item",
  "values": [
  "16-19"
  ]
  }
  },
  {
  "code": "Tid",
  "selection": {
  "filter": "item",
  "values": [
  "2015"
  ]
  }
  }
  ],
  "response": {
  "format": "json"
  }
  }'


req <- POST("http://api.scb.se/OV0104/v1/doris/en/ssd/START/AM/AM0207/AM0207H/BefSyssAldKonK",
            body = bodytxt, encode = "json")

stop_for_status(req)
json <- content(req, "text")


# JSON starts with an invalid character:
validate(json)
json <- substring(json, 2)
validate(json)

# Now we can parse
object <- fromJSON(json)
print(object)

But if I change the query so it includes a “Ö”, it doesn’t work. Example:

bodytxt <- '{
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "vs:RegionKommun07",
        "values": [
          "0114",
          "1280"
          ]
      }
    },
    {
      "code": "Sysselsattning",
      "selection": {
        "filter": "item",
        "values": [
          "FÖRV"
          ]
      }
    },
    {
      "code": "Alder",
      "selection": {
        "filter": "item",
        "values": [
          "16-19"
          ]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "item",
        "values": [
          "2015"
          ]
      }
    }
    ],
  "response": {
    "format": "json"
  }
}'

The other problem I have is that, as far as I understand, it should be possible to change the json query to a list and include the list in the call to the server, but I get a "404"-error. Example:

body_list <- fromJSON(bodytxt)
req <- POST("http://api.scb.se/OV0104/v1/doris/en/ssd/START/AM/AM0207/AM0207H/BefSyssAldKonK",
            body = body_list, encode = "json")

What am I doing wrong?

Ps! I know that it exists an excellent package on CRAN that is named pxweb that is very easy to use to download data from Statistics Sweden. But I want to learn the api and pxwed doesn’t let me skip dimensions in the query.

System: Windows 7, r script saved in utf-8 encoding.

Community
  • 1
  • 1
ChristianL
  • 140
  • 7

1 Answers1

4

Try these parameters for fromJSON():

library(httr)
library(jsonlite)

your data:

bodytxt <- '{
  "query": [
    {
      "code": "Region",
      "selection": {
        "filter": "vs:RegionKommun07",
        "values": [
          "0114",
          "1280"
          ]
      }
    },
    {
      "code": "Sysselsattning",
      "selection": {
        "filter": "item",
        "values": [
          "FÖRV"
          ]
      }
    },
    {
      "code": "Alder",
      "selection": {
        "filter": "item",
        "values": [
          "16-19"
          ]
      }
    },
    {
      "code": "Tid",
      "selection": {
        "filter": "item",
        "values": [
          "2015"
          ]
      }
    }
    ],
  "response": {
    "format": "json"
  }
}'

Modified fromJSON() conversion:

query <- jsonlite::fromJSON(bodytxt,
                            simplifyVector=FALSE,
                            simplifyDataFrame=FALSE)

Same call you made (added verbose() for my benefit but you can remove that):

URL <- "http://api.scb.se/OV0104/v1/doris/en/ssd/START/AM/AM0207/AM0207H/BefSyssAldKonK"
req <- POST(URL, body=query, encode="json", verbose())

The result:

content(req, simplifyDataFrame=TRUE)

## Warning: JSON string contains (illegal) UTF8 byte-order-mark!

## $columns
##             code                        text type
## 1         Region                      region    d
## 2 Sysselsattning           employment status    d
## 3          Alder                         age    d
## 4            Tid                        year    t
## 5       AM0207F2 Population 16+ years (RAMS)    c
## 
## $comments
## list()
## 
## $data
##                       key values
## 1 0114, FÖRV, 16-19, 2015    379
## 2 1280, FÖRV, 16-19, 2015   1443

The warning is due to the fact that the API server is sending a byte order mark back with the result (it's a Microsoft server so they're kind of brain dead when it comes to this). jsonlite (the bit inside content() that does the conversion) can handle it.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • Works perfect! Thanks! So It seems like it wasn't a character encoding problem after all or does the convertion from json to list in the fromJSON() function change the encoding in any way? – ChristianL Feb 20 '17 at 14:39
  • No mangling occurs in the `fromJSON()` call. It was just the way the JSON was being converted before posting to the API. It tries to be "helpful" for data analysis work (and the defaults usually are) but it's usually not what other programs expect to consume. You are to be commended for ploughing through this. Your extra work to deal with customized API parameter calls is a great example for the many others who will no doubt be querying for how to do this in the future. – hrbrmstr Feb 20 '17 at 14:42
  • 1
    I just realised that if I choose "csv" as format in the json query in the POST body, the code above returns a nicely formatted tibble. Perhaps good to know if somebody want to use the code. – ChristianL Feb 21 '17 at 13:22
  • the nice thing abt the JSON return is you get metadata the column types vs R converting them for you. It's not much more work to apply column names and ensure those types after the JSON data return. – hrbrmstr Feb 21 '17 at 14:01