0

I want to convert multiple JSON files into one dataframe.

Below is the JSON object:

{'alerts': [{'affected_services': {'elevators': [],
                                   'services': [{'mode_name': 'Subway',
                                                 'route_id': 'Red',
                                              'route_name': 'Red''Line',
'route_type': '1',
'stop_id': '70061',
'stop_name': 'Alewife'},
{'mode_name': 'Subway',
'route_id': 'Red',
'route_name': 'Red ''Line',
'route_type': '1',
'stop_id': '70063',
'stop_name': 'Davis ''- ''Inbound'}]},
                         'alert_id': 176434,
                         'alert_lifecycle': 'Upcoming',
                         'cause': 'UNKNOWN_CAUSE',
                         'created_dt': '1491332705',
                         'description_text': 'Due to the Floating Slab '
                             'Project, buses will replaceRed ',
                         'effect': 'DETOUR',
                         'effect_name': 'Shuttle',
                         'effect_periods': [{'effect_end': '1493620200',
                         'effect_start': '1493454600'}],
                  'header_text': 'Buses will replace Red Line service',
                         'last_modified_dt': '1491332705',
                         'service_effect_text': 'Red Line shuttle',
                         'severity': 'Severe',
                    'short_header_text': 'Buses will replace Red Line ',
                    'timeframe_text': 'April 29-30',
'url': 'http://www.mbta.com/about_the_mbta/t_projects/default.asp?id=22956'}],
             'stop_id': 'place-alfcl',
             'stop_name': 'Alewife'}

Below is the code I tried:

from pandas.io.json import json_normalize
import pandas as pd
import glob

json_file_path = path_stop +'*'
lambda_file = lambda json_file_path : glob.glob(json_file_path)

for json_file in lambda_file(json_file_path):
    with open(json_file) as json_data:
        result = json_normalize(json_data,'alerts',['affected_services',['elevators','services',['mode_name','route_id','stop_id','stop_name']],'alert_lifecycle','cause','created_dt','effect','effect_name','severity','timeframe_text'],'stop_id','stop_name')

print(result)

Please someone help me. Thanks in advance!

Chiheb Nexus
  • 9,104
  • 4
  • 30
  • 43
Palak
  • 55
  • 1
  • 10

3 Answers3

0

If you have that json as a string, you can turn it into a json object by calling

temp = json.load(JSON_string)

Then you should be able to call:

df = pd.read_json(temp)

Then you have your dataframe "df"

Vaibhav
  • 484
  • 4
  • 7
0

This code should do what you want:

import pandas
import glob
json_files = glob.glob('*.json')
def merge_files(json_files):
    dfs = list()
    for json_file in json_files:
        df = pandas.read_json(json_file)
        dfs.append(df)
    df = pandas.concat(dfs)
    return df

df = merge_files(json_files)

However, I should caution you probably want to massage your input data a bit more in order to understand the shape and contents of the DataFrame that you are building.

aquil.abdullah
  • 3,059
  • 3
  • 21
  • 40
  • Thanks for the response. But, It is giving me below output: alerts stop_id stop_name 0 {'created_dt': '1491332705', 'url': 'http://ww... place-alfcl Alewife 0 {'created_dt': '1477071813', 'service_effect_t... place-alsgr Allston Street 0 {'created_dt': '1489613117', 'url': 'http://ww... place-andrw Andrew 0 {'created_dt': '1477071813', 'service_effect_t... place-armnl Arlington 1 {'alert_id': 174420, 'service_effect_text': 'G... place-armnl Arlington And I want values in column 1 to be separated in different columns. – Palak Apr 13 '17 at 20:45
  • Can you give an example of the DataFrame output that you would like to see? – aquil.abdullah Apr 14 '17 at 14:19
  • Affected_Route_ID Affected_Route_Name Affected_Stop_IDAffected_Stop_Name Elevator_Affected Cause Created_Date Severity Effect_Name Effect Stop_id Stop_name Red Red Line 70061 Alewife [] UNKNOWN_CAUSE 1491332705 Severe Shuttle DETOUR place-alfcl Alewife Red Red Line 70063 Davis - Inbound [] UNKNOWN_CAUSE 1491332705 Severe Shuttle DETOUR place-alfcl Alewife Red Red Line 70064 Davis - Outbound [] UNKNOWN_CAUSE 1491332705 Severe Shuttle DETOUR place-alfcl Alewife – Palak Apr 14 '17 at 22:59
  • I want dataframe to look like this csv output – Palak Apr 14 '17 at 23:00
0

Above answer helped to some extent. I would suggest storing requested data in CSV when you have complex list and dictionary type of structure rather storing data in JSON format. It will be easy to convert data into a data frame.

Instead of using JSON files to make data frame I used CSV file and then converted into data frame.

Palak
  • 55
  • 1
  • 10