0

I am looking to develop a transit app using GTFS static data. One of the constraints I've set to myself is that the app should use minimal mobile data transfers. Therefore, I would like to embed all the data in the app.

My issue is that GTFS data sets are usually quite large (85MB uncompressed for the city of Sydney for example). I've done a bit of reverse engineering on other apps out there and found out that some of them have managed to compress all that data into a much smaller file (I'm talking about a few MB at most).

Using 7zip, I've managed to compress my 85MB data set down to 5MB which is acceptable for me. The next step is for me to use that 7z file into my app and that's where I'm stuck. There's no way I'm going to uncompress it and put it in a SQL database as that will use too much space on the phone. So I was wondering what are my other options.

Thanks

chopchop
  • 1,905
  • 2
  • 22
  • 37
  • What have you tried? At the top level, have you tried taking a data file containing all this feed information and seeing how well it compresses with gzip, bzip2, and xz? – Multimedia Mike Mar 25 '13 at 01:00
  • 7zip manages to compress it down to 5.5MB which is acceptable for me so thanks for the tip, I tried rar and zip but didn't know 7zip could make such a difference. However, how could I use a 7zip file in a mobile app? I can't just decompress and write to an SQL db as that would take too much space on the phone. – chopchop Mar 25 '13 at 01:33

2 Answers2

0

First, for embedding, I recommend using the Embedded XZ library (similar to 7zip). I have embedded this in a project and had good luck with it. Just be sure to compress data using 'xz --check=crc32' so it's compatible with Embedded XZ, and remember to initialize the CRC table.

As for a decompression strategy, you may need to segment the data in such a way that you can decompress different parts of it on demand (i.e., a tree of databases). I'm not familiar with your data's characteristics. Will a user need it all loaded at the same time? Or can it easily be compartmentalized?

Also, XZ can be a bit slow, even to decode. Have you evaluated how well regular gzip performs? That tends to be A) very fast; and B) available as a standard part of all embedded and mobile frameworks.

Multimedia Mike
  • 12,660
  • 5
  • 46
  • 62
  • gzip is not good enough at the moment, but it might be if I dig into the data and clean it up (a lot of ids are long strings and I want to convert them to numbers, there is also some redundancy in the data). Your solution seems viable. Do you think the decompression is fast enough for this: open activity, decompress a few kB of data, display it to user. Or is this going to make the UI lag? – chopchop Mar 26 '13 at 03:14
  • @chopchop: The only way to know for sure is to test it on some target Android devices and see if the speed is acceptable. This seems like the kind of thing that would be safest to put in its own thread, though, away from the main UI thread. – Multimedia Mike Mar 26 '13 at 03:40
  • Ok thanks, it seems to me like the only way to do it at the moment anyway. I'll let you know how it goes if you're interested – chopchop Mar 27 '13 at 00:47
  • @chopchop: Yep, I would love to know if you have success. – Multimedia Mike Mar 27 '13 at 01:00
-1

Use protocol binary format (pbf) formely google and now open source. It is compact and very fast searchable, so no need to decompress it on a device and load it into a database on that device because pbf acts as a database. Just include pbf library in your code to query it. Of course you have to compress it once before distributing the data online.