I'm trying to covert a protobuf feed to pandas dataframe for one of my hobby projects. I tried several different techniques to accomplish this but nothing seems to really solve my issue.
I use following code to retrieve GTFS-RT TripUpdates feed:
feed = gtfs_realtime_pb2.FeedMessage()
headers = {
'Accept': 'application/octet-stream',
'Accept-encoding': 'br, gzip, deflate'
}
response = requests.get('<PROVIDER:APIKEY>', headers=headers, stream=True)
feed.ParseFromString(response.content)
test_dict = protobuf_to_dict(feed)
The result of using protobuf_to_dict
is a a dict with one single line:
{'header': {'gtfs_realtime_version': '2.0', 'incrementality': 0, 'timestamp': 1641582104}, 'entity': [{'id': '14050001276385923' [...]
I've tried several things get around this issue.
Reading feed message as JSON: did not work because the JSON object must be str, bytes or bytearray, not dict.
Iterating through dict:
for entity in test_dict.entity:
if entity.HasField('vehicle')
[logic for building dataframe]
It didn't work either, because 'dict' object has no attribute 'entity'.
Ok! After several hours of reading I tried to flatten and normalize feed message as described here and some other threads. Unfortunately, neither json_normalize
or flatten_json
did solve the issue.
At this point I feel like going in circle and not seeing something very obvious that might help me. The end-goal is to create a dataframe which contains TripUpdates data which later will be merged with another dataframe to update arrival and departure times.