1

I'm setting up a Python function to use the Surveymonkey API to get survey responses from Surveymonkey.

The API returns responses in a JSON format with a deep recursive file structure.

I'm having issues trying to flatten this JSON so that it can go into Google Cloud Storage.

I have tried to flatten the response using the following code. Which works; however, it does not transform it to the format that I am looking for.

{
  "per_page": 2,
  "total": 1,
  "data": [
    {
      "total_time": 0,
      "collection_mode": "default",
      "href": "https://api.surveymonkey.com/v3/responses/5007154325",
      "custom_variables": {
        "custvar_1": "one",
        "custvar_2": "two"
      },
      "custom_value": "custom identifier for the response",
      "edit_url": "https://www.surveymonkey.com/r/",
      "analyze_url": "https://www.surveymonkey.com/analyze/browse/",
      "ip_address": "",
      "pages": [
        {
          "id": "73527947",
          "questions": [
            {
              "id": "273237811",
              "answers": [
                {
                  "choice_id": "1842351148"
                },
                {
                  "text": "I might be text or null",
                  "other_id": "1842351149"
                }
              ]
            },
            {
              "id": "273240822",
              "answers": [
                {
                  "choice_id": "1863145815",
                  "row_id": "1863145806"
                },
                {
                  "text": "I might be text or null",
                  "other_id": "1863145817"
                }
              ]
            },
            {
              "id": "273239576",
              "answers": [
                {
                  "choice_id": "1863156702",
                  "row_id": "1863156701"
                },
                {
                  "text": "I might be text or null",
                  "other_id": "1863156707"
                }
              ]
            },
            {
              "id": "296944423",
              "answers": [
                {
                  "text": "I might be text or null"
                }
              ]
            }
          ]
        }
      ],
      "date_modified": "1970-01-17T19:07:34+00:00",
      "response_status": "completed",
      "id": "5007154325",
      "collector_id": "50253586",
      "recipient_id": "0",
      "date_created": "1970-01-17T19:07:34+00:00",
      "survey_id": "105723396"
    }
  ],
  "page": 1,
  "links": {
    "self": "https://api.surveymonkey.com/v3/surveys/123456/responses/bulk?page=1&per_page=2"
  }
}
answers_df = json_normalize(data=response_json['data'],
                        record_path=['pages', 'questions', 'answers'],
                        meta=['id', ['pages', 'questions', 'id'], ['pages', 'id']])

Instead of returning a row for each question id, I need it to return a column for each question id, choice_id, and text field.

The columns I would like to see are total_time, collection_mode, href, custom_variables.custvar_1, custom_variables.custvar_2, custom_value, edit_url, analyze_url, ip_address, pages.id, pages.questions.0.id, pages.questions.0.answers.0.choice_id, pages.questions.0.answers.0.text, pages.questions.0.answers.0.other_id

Instead of the each Question ID, Choice_id, text and answer being on a separate row. I would like a column for each one. So that there is only 1 row per survey_id or index in data

Nick
  • 55
  • 5

0 Answers0