0

I have downloaded some GTFS-RT Trip Updates data in dictionary format using this code:

from google.transit import gtfs_realtime_pb2
import requests
import pandas as pd

feed = gtfs_realtime_pb2.FeedMessage()
# requests will fetch the results from a url, in this case, the positions of all buses
response = requests.get('link')
feed.ParseFromString(response.content)

# Use the data as a dict 
from protobuf_to_dict import protobuf_to_dict

# convert to dict from our original protobuf feed
buses_dict = protobuf_to_dict(feed)

The output dictionary is a dictionary with many nested dictionaries. The trip updates of one bus has the following format:

id: "14010512942203036"
trip_update {
  trip {
    trip_id: "14010000550082549"
    start_date: "20210120"
    schedule_relationship: SCHEDULED
  }
  stop_time_update {
    stop_sequence: 24
    arrival {
      delay: -20
      time: 1611145420
      uncertainty: 0
    }
    departure {
      delay: 52
      time: 1611145492
      uncertainty: 0
    }
    stop_id: "9022001005006001"
  }
  stop_time_update {
    stop_sequence: 25
    arrival {
      delay: 52
      time: 1611146092
    }
    departure {
      delay: 52
      time: 1611146092
    }
    stop_id: "9022001005007002"
  }
  vehicle {
    id: "9031001004002234"
  }
  timestamp: 1611145514
}

Do you have any idea on how to convert this data in a more useful format? Let's say pandas dataframe.

Thank you in advance!

Anas.S
  • 193
  • 1
  • 11

1 Answers1

1

I used this url for testing:

url = 'https://cdn.mbta.com/realtime/VehiclePositions.pb'

All you need to do is add this line to the end of your script for a pandas dataframe

pd.json_normalize(buses_dict['entity'])

It'll break this dictionary into these columns

Index(['id', 'vehicle.trip.trip_id', 'vehicle.trip.start_time',
       'vehicle.trip.start_date', 'vehicle.trip.schedule_relationship',
       'vehicle.trip.route_id', 'vehicle.trip.direction_id',
       'vehicle.position.latitude', 'vehicle.position.longitude',
       'vehicle.position.bearing', 'vehicle.current_stop_sequence',
       'vehicle.current_status', 'vehicle.timestamp', 'vehicle.stop_id',
       'vehicle.vehicle.id', 'vehicle.vehicle.label',
       'vehicle.occupancy_status', 'vehicle.position.speed'],
      dtype='object') 
Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14