2

I am trying to parse GTFS realtime trip_update data that is a plain text file format and not in a pb (protobuf) format.

(here is the feed url)

https://extranet.trainose.gr/epivatikos/transit/trip_updates

However, the only examples that I find deal with pb files.

from google.transit import gtfs_realtime_pb2
....
response = requests.get(url, allow_redirects=True)
feed.ParseFromString(response.content)
for entity in feed.entity:

So how could I parse the feed that is not pb? Thanks.

PKey
  • 3,715
  • 1
  • 14
  • 39

2 Answers2

2

It turns out that there is a way to process pain text feed with something like this:

   response = requests.get(url, allow_redirects=True)
    ...
    try:
        from google.protobuf import text_format
        text_format.Parse(response.content.decode('UTF-8'), feed, allow_unknown_extension=True)
        print("Parse with text format successfully.")
        printResults(feed)
    except text_format.ParseError as e:
            raise IOError("Cannot parse text %s." % (str(e)))

actually here is my whole script

from google.transit import gtfs_realtime_pb2
import os
import requests


def main():
    feed = gtfs_realtime_pb2.FeedMessage()
    url = ('https://feed.utl.com/feed')
    get_feed(feed, url)

def printResults(feed):
    from datetime import datetime
    ts = int(str(feed.header.timestamp))
    print("Last update: " + datetime.fromtimestamp(ts).strftime('%d-%m-%Y %H:%M:%S'))
    for entity in feed.entity:
        print (str(entity.trip_update.trip.trip_id)+';')
        with open('output.txt', mode='w') as f:
            for entity in feed.entity:
                if entity.HasField('trip_update'):
                        f.write(str(entity.trip_update.trip.trip_id)+';')
def get_feed(feed, url):
    proxies = {'http': '127.0.0.1:5555','https': '127.0.0.1:5555'}
    response = requests.get(url, allow_redirects=True,proxies=proxies)
    try:
        feed.ParseFromString(response.content)
        printResults(feed)
    except :
        print("Oops!  That was no valid data. Try again...\n\n" + response.content)
        try:
            from google.protobuf import text_format
            text_format.Parse(response.content.decode('UTF-8'), feed, allow_unknown_extension=True)
            print("Parse with text format successfully.")
            printResults(feed)
        except text_format.ParseError as e:
            raise IOError("Cannot parse file %s." % (str(e)))
if __name__ == "__main__":
    main()
PKey
  • 3,715
  • 1
  • 14
  • 39
  • 1
    Fascinating -- I made my claim based on [this documentation](https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.text_format-module), which makes no mention of a `Parse` method, but comparing with the actual protobuf Python code on github, I see the documentation seems to be somewhat out of date. – abeboparebop Jun 18 '19 at 06:08
1

Human-readable text is not the standard format for sending and receiving protobuf messages. (If you only want plain-text, you should be using a standard text format like JSON.) In principle it is only for debugging purposes. Thus there are no methods in the Python Protobuf library for parsing plain-text messages. The Right Solution here is to find an actual protobuf endpoint, perhaps by getting in touch with the domain owner. (EDIT: apparently there actually is a Parse method for text-formatted messages in the Python library -- see the source code here.)

That said, the C++ Protobuf library seems to contain methods for parsing the text format directly, so if you have no way of getting access to a real protobuf, this might be a backup option: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format.

There are no strict guarantees about consistency of the text format across versions, as far as I know, but the fact that it's exposed in the library suggests it's probably pretty stable in practice. This discussion gives the same impression (since there are Google-internal tools that parse the text format): https://github.com/protocolbuffers/protobuf/issues/1297#issuecomment-390825524.

abeboparebop
  • 7,396
  • 6
  • 37
  • 46
  • Thanks for the reply! You actually gave me the idea how to proceed with finding the solution for python's text formatted gtfs rt. So I'll up-vote and later will post my own - python solution. – PKey Jun 18 '19 at 05:45