0

I'am trying to convert pandas' dataFrame to json.

y=pd.read_csv('testx.csv',encoding='utf-8')
y.columns = ['i','city','language','words']
del y['i']
y = y.set_index(['city','language'])
z=y.to_json(orient='index')

I've got incorrect json with [ and { in quotes. What am I doing wrong?

{"["Moscow","Russian"]":{"words":3300000},"["Moscow","English"]":{"words":550000},"["Moscow","French"]":{"words":100000},"
["London","English"]":{"words":9100000},"["London","Russian"]":{"words":150000},"["London","Spanish"]":{"words":90000},...

Ideally, the dataframe:

city           language           words       
Moscow         Russian            3300000
Moscow         English            550000
Moscow         French             100000
London         English            9100000
London         Russian            150000
London         Spanish            90000
...

must be converted to this:

[
  {
    "city": "Moscow",
    "language": {
      "Russian": 3300000,
      "English": 550000,
      "French": 100000
    }
  },
  {
    "city": "London",
    "language": {
      "English": 9100000,
      "Russian": 150000,
      "Spanish": 90000
    }
  }
]
maxymoo
  • 35,286
  • 11
  • 92
  • 119
mailman_73
  • 778
  • 12
  • 29

1 Answers1

0

Would you be willing to use a dict comprehension to explicitly craft your output? You can then use json.dumps to convert from python dict to JSON. Your desired form is unfortunately outside the standard outputs that to_json supports.

[{"city":i, 
  "language":{l:n 
      for l,n in zip(g['language'], g['words'])}} 
   for i,g in df.groupby('city')] 
[{'city': 'London',
  'language': {'English': 9100000, 'Russian': 150000, 'Spanish': 90000}},
 {'city': 'Moscow',
  'language': {'English': 550000, 'French': 100000, 'Russian': 3300000}}]
maxymoo
  • 35,286
  • 11
  • 92
  • 119
  • Thank you. Unfortunately, I've got an error trying to do json.dumps `import json json.dumps(dict_to_json_file)` The error: `raise TypeError(repr(o) + " is not JSON serializable") TypeError: 3208 is not JSON serializable` – mailman_73 Mar 21 '16 at 06:43
  • hmm strange, maybe check your datatypes, makes sure the integer column is actually `int`? – maxymoo Mar 22 '16 at 00:44