2

I need to parse NMEA data from build-in GPS receiver in Android device. I'm receiving this data few times per second as a string. I'm curious is it possible to do this without garbage collection allocations or parsing strings is one of this moments where I can call GC.Collect() with a clear conscience?

Exactly I need to call string.split() and some other methods like Substring() and result convert to with double.Parse().

I tried to do this with converting to char[] but in that way GC allocations was even bigger.

GPS NMEA data have many sentences and I need to parse 2-3 of them every second. Below is example code to parse one of this sentences - $GPRMC

Example sentences:

$GPRMC,081836,A,3751.65,S,14507.36,E,000.0,360.0,130998,011.3,E*62 $GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47 $GPGSA,A,3,32,27,03,193,29,23,19,16,21,31,14,,1.18,0.51,1.07*35

        // Divide the sentence into words
        string[] Words = sSentence.split(',');
        // Do we have enough values to describe our location?
        if (Words[3] != "" & Words[4] != "" &
            Words[5] != "" & Words[6] != "")
        {
            // example 5230.5900,N
            // 52°30.5900\N

            // Yes. Extract latitude and longitude


            //Latitude decimal

            double DegreesLat = double.Parse(Words[3].Substring(0, 2), NmeaCultureInfo);
            string[] tempLat = Words[3].Substring(2).ToString ().Split ('.');
            double MinutesLat = double.Parse (tempLat[0], NmeaCultureInfo);
            string SecLat = "0";
            if (tempLat.Length >= 2) {
                SecLat = "0."+tempLat[1];
            }
            double SecondsLat = double.Parse (SecLat, NmeaCultureInfo)*60;

            double Latitude = (DegreesLat + (MinutesLat / 60) + (SecondsLat/3600));


            //Longitude decimal

            double DegreesLon = double.Parse(Words[5].Substring(0, 3), NmeaCultureInfo);
            string[] tempLon = Words[5].Substring(3).ToString ().Split ('.');
            double MinutesLon = double.Parse (tempLon[0], NmeaCultureInfo);
            string SecLon = "0";
            if (tempLon.Length >= 2) {
            SecLon = "0."+tempLon[1];
            }
            double SecondsLon = double.Parse (SecLon, NmeaCultureInfo)*60;

            double Longitude = (DegreesLon + (MinutesLon / 60) + (SecondsLon/3600));

            // Notify the calling application of the change
            if (PositionReceived != null)
                PositionReceived(Latitude, Longitude);
seek
  • 1,065
  • 16
  • 33
  • 1
    And where do you plan to store your strings? – Steve May 02 '17 at 20:46
  • 5
    This is a small bit of a [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem), show us what you are doing that requires you to parse a large number of strings per second and we may be able to give you an alteritive solution that requires no (or at least less frequent) string parsing. – Scott Chamberlain May 02 '17 at 20:48
  • Whenever you think you need to call `GC.Collect()`, you're very likely to optimize the wrong end of your problem. – Filburt May 02 '17 at 20:50
  • @Filburt he is working with Unity 3d, calling `GC.Collect()` on the first frame of a new level load is a good opportunity to clear up space because the player is already waiting for the level load to happen so you can move the hickup of a collection to the place you want it to happen instead of when you run out of room in the middle of gameplay. – Scott Chamberlain May 02 '17 at 20:52
  • @ScottChamberlain I need to parse NMEA data received from build-in Android device GPS receiver. – seek May 02 '17 at 20:53
  • @ScottChamberlain Maybe I've inherited too much code littered with `GC.Collect()` in what only can be considered a [Cargo Cult](https://en.wikipedia.org/wiki/Cargo_cult_programming) manner. We'll need to see code. – Filburt May 02 '17 at 20:58
  • @ScottChamberlain Of course I'm checking it at the beginning. And after this checking I call method for $GPGGA or $GPRMC etc. – seek May 02 '17 at 21:17
  • Started digging around and found [Calculate distance, bearing and more between Latitude/Longitude points](http://www.movable-type.co.uk/scripts/latlong.html) which shows javascript functions to convert your easting/northing to lat/lon (look all the way to the bottom at the code of the Dms library code). This could spare you a lot of your string parsing. – Filburt May 02 '17 at 21:58
  • @Filburt Unfortunately most consuming method for GC in this is sSentence.split(','); , and split is also used in this javascript from your link. I'm most focused to find alternative for this string.split(); – seek May 02 '17 at 22:02
  • I could not think of a less consuming method - you could only try if `sSentence.Split(',', 8)` does any good but I'd rather think you can optimize more if you get rid of `Words[3].Substring(2).ToString().Split ('.');` and the rest of the string mucking. – Filburt May 02 '17 at 22:35

2 Answers2

5

Update 02.06.2020: starting from netstandard2.1 you can replace string with ReadOnlySpan and perform the task without allocations. See https://learn.microsoft.com/en-us/dotnet/api/system.memoryextensions?view=netcore-3.1


You are asking how could I manage strings without allocating space?. Here is an answer: you always can use stackalloc to allocate char[] array on stack without GC pressure and then create final string (if you need it) using char* constructor. But be careful, because it's unsafe and it very unlikely that you cannot just allocate a common char[] or StringBuilder because collection of gen0 costs almost nothing.

You have tons of code like Words[3].Substring(2).ToString ().Split ('.') which is very memory-heavy. Just fix it and you're golden. But if it doesn't help you, you have to refuse using Substring and other methods that allocates memory, and use your own parser.


Let's start optimizing. Firsly, we can fix all others allocations. You said you already had did it, but here is my variant:

private static (double Latitude, double Longitude)? GetCoordinates(string input)
{
    // Divide the sentence into words
    string[] words = input.Split(',');
    // Do we have enough values to describe our location?
    if (words[3] == "" || words[4] == "" || words[5] == "" || words[6] == "")
        return null;

    var latitude = ParseCoordinate(words[3]);
    var longitude = ParseCoordinate(words[5]);

    return (latitude, longitude);
}

private static double ParseCoordinate(string coordinateString)
{
    double wholeValue = double.Parse(coordinateString, NmeaCultureInfo);

    int integerPart = (int) wholeValue;
    int degrees = integerPart / 100;
    int minutes = integerPart % 100;
    double seconds = (wholeValue - integerPart) * 60;

    return degrees + minutes / 60.0 + seconds / 3600.0;
}

Ok, let's assume it's still slow and we want to optimize it further. Firsly, we should replace this condition:

if (words[3] == "" || words[4] == "" || words[5] == "" || words[6] == "")
        return null;

What we are doing here? We just want to know if string contains some value. We can study it without parsing a string. And with further optimizations we won't parse string at all if something is wrong. It may look like:

private static (string LatitudeString, string LongitudeString)? ParseCoordinatesStrings(string input)
{
    int latitudeIndex = -1;
    for (int i = 0; i < 3; i++)
    {

        latitudeIndex = input.IndexOf(',', latitudeIndex + 1);
        if (latitudeIndex < 0)
            return null;
    }
    int latitudeEndIndex = input.IndexOf(',', latitudeIndex + 1);
    if (latitudeEndIndex < 0 || latitudeEndIndex - latitudeIndex <= 1)
        return null; // has no latitude
    int longitudeIndex = input.IndexOf(',', latitudeEndIndex + 1);
    if (longitudeIndex < 0)
        return null;
    int longitudeEndIndex = input.IndexOf(',', longitudeIndex + 1);
    if (longitudeEndIndex < 0 || longitudeEndIndex - longitudeIndex <= 1)
        return null; // has no longitude
    string latitudeString = input.Substring(latitudeIndex + 1, latitudeEndIndex - latitudeIndex - 1);
    string longitudeString = input.Substring(longitudeIndex + 1, longitudeEndIndex - longitudeIndex - 1);
    return (latitudeString, longitudeString);
}

And now, combining them all together:

using System;
using System.Globalization;

namespace SO43746933
{
    class Program
    {
        private static readonly CultureInfo NmeaCultureInfo = CultureInfo.InvariantCulture;

        static void Main(string[] args)
        {
            string input =
                "$GPRMC,081836,A,3751.65,S,14507.36,E,000.0,360.0,130998,011.3,E*62 $GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47 $GPGSA,A,3,32,27,03,193,29,23,19,16,21,31,14,,1.18,0.51,1.07*35";
            var newCoordinates = GetCoordinatesNew(input);
            var oldCoorinates = GetCoordinatesOld(input);
            if (newCoordinates == null || oldCoorinates == null)
            {
                throw new InvalidOperationException("should never throw");
            }
            Console.WriteLine("Latitude: {0}\t\tLongitude:{1}", newCoordinates.Value.Latitude, newCoordinates.Value.Longitude);
            Console.WriteLine("Latitude: {0}\t\tLongitude:{1}", oldCoorinates.Value.Latitude, oldCoorinates.Value.Longitude);
        }

        private static (double Latitude, double Longitude)? GetCoordinatesNew(string input)
        {
            // Divide the sentence into words
            var coordinateStrings = ParseCoordinatesStrings(input);
            // Do we have enough values to describe our location?
            if (coordinateStrings == null)
                return null;

            var latitude = ParseCoordinate(coordinateStrings.Value.LatitudeString);
            var longitude = ParseCoordinate(coordinateStrings.Value.LongitudeString);

            return (latitude, longitude);
        }

        private static (string LatitudeString, string LongitudeString)? ParseCoordinatesStrings(string input)
        {
            int latitudeIndex = -1;
            for (int i = 0; i < 3; i++)
            {

                latitudeIndex = input.IndexOf(',', latitudeIndex + 1);
                if (latitudeIndex < 0)
                    return null;
            }
            int latitudeEndIndex = input.IndexOf(',', latitudeIndex + 1);
            if (latitudeEndIndex < 0 || latitudeEndIndex - latitudeIndex <= 1)
                return null; // has no latitude
            int longitudeIndex = input.IndexOf(',', latitudeEndIndex + 1);
            if (longitudeIndex < 0)
                return null;
            int longitudeEndIndex = input.IndexOf(',', longitudeIndex + 1);
            if (longitudeEndIndex < 0 || longitudeEndIndex - longitudeIndex <= 1)
                return null; // has no longitude
            string latitudeString = input.Substring(latitudeIndex + 1, latitudeEndIndex - latitudeIndex - 1);
            string longitudeString = input.Substring(longitudeIndex + 1, longitudeEndIndex - longitudeIndex - 1);
            return (latitudeString, longitudeString);
        }

        private static double ParseCoordinate(string coordinateString)
        {
            double wholeValue = double.Parse(coordinateString, NmeaCultureInfo);

            int integerPart = (int) wholeValue;
            int degrees = integerPart / 100;
            int minutes = integerPart % 100;
            double seconds = (wholeValue - integerPart) * 60;

            return degrees + minutes / 60.0 + seconds / 3600.0;
        }

        private static (double Latitude, double Longitude)? GetCoordinatesOld(string input)
        {
            // Divide the sentence into words
            string[] Words = input.Split(',');
            // Do we have enough values to describe our location?
            if (!(Words[3] != "" && Words[4] != "" &
                  Words[5] != "" && Words[6] != ""))
                return null;
            // example 5230.5900,N
            // 52°30.5900\N

            // Yes. Extract latitude and longitude


            //Latitude decimal

            var wholeLat = double.Parse(Words[3], NmeaCultureInfo);

            int integerPart = (int)wholeLat;
            int DegreesLat = integerPart / 100;
            string[] tempLat = Words[3].Substring(2).Split('.');
            int MinutesLat = integerPart % 100;
            string SecLat = "0";
            if (tempLat.Length >= 2)
            {
                SecLat = "0." + tempLat[1];
            }
            double SecondsLat = double.Parse(SecLat, NmeaCultureInfo) * 60;

            double Latitude = (DegreesLat + (MinutesLat / 60.0) + (SecondsLat / 3600.0));


            //Longitude decimal

            double DegreesLon = double.Parse(Words[5].Substring(0, 3), NmeaCultureInfo);
            string[] tempLon = Words[5].Substring(3).ToString().Split('.');
            double MinutesLon = double.Parse(tempLon[0], NmeaCultureInfo);
            string SecLon = "0";
            if (tempLon.Length >= 2)
            {
                SecLon = "0." + tempLon[1];
            }
            double SecondsLon = double.Parse(SecLon, NmeaCultureInfo) * 60;

            double Longitude = (DegreesLon + (MinutesLon / 60) + (SecondsLon / 3600));
            return (Latitude, Longitude);
        }
    }
}

It allocates 2 temporary string but it shouldn't be a problem for GC. You may want ParseCoordinatesStrings to return (double, double) instead of (string, string), minimizing lifetime of latitudeString and longitudeString by making them local variables that doesn't returns from methods. In this case just move double.Parse there.

Alex Zhukovskiy
  • 9,565
  • 11
  • 75
  • 151
  • Could you explain how it is possible to use StringBuilder to make split? I think that we need convert StringBuilder toString() and it will be even worst for GC. String.Split() is the last thing that left in my code that are doing GC. – seek May 03 '17 at 05:42
  • @seek StringBuilder may be used in your own parser. If you have problems with `string.Split` just don't use it. You need only 4 params while you are parsing the whole string and create a lot of parts you never use. For example, you are using `string` operations to split `32.758` on `32` and `758` and then pass it to `MinutesLat` and `SecondsLat`. Why not just parse the whole number and then operate with numbers? – Alex Zhukovskiy May 03 '17 at 09:01
  • @AlexZhukovskiy I told in in previous comment that I already changed all other GC allocating methods and only one that making problems is string[] Words = sSentence.split(','); - I'm using all params from this split in parsing other NMEA sentences. Saying that just don't use it without any other solution is not looking like an answer for me. – seek May 03 '17 at 09:04
  • @AlexZhukovskiy Looks very promising but as I can see we can't avoid GC allocation. Using your method 163 Bytes instead of 757 Bytes. I'm curious if I can leave such allocation about 5-7 times per second... – seek May 03 '17 at 12:21
  • The only thing you can do is replace `double.Parse` with your own parsing method, which doesn't require substring creation, but you won't be able to use `CultureInfo` (otherwise you should complicate your parsing code a lot) and maybe will get performance hit (because of managed implementation of parsing code). And yes, 1kb allications per second is very low level. Why do you need to optimize it further? – Alex Zhukovskiy May 04 '17 at 16:10
0

When it comes to GC and parsing in Unity, there are 2 ways of handling it:

The traditional way.

The Unity way.

Both works really well, but one will sound stupidly simple and, in truth, it really is stupidly simple.

The traditional way consist of using one of the many tricks in the book of C# & C++ that are usually used in other software. It was already covered by the others in the other answers multiple times so, as cheap as it might sound, I won't cover it here.

The Unity way is the official way explained by developers from Unity Technologies. (It's usually explained during their yearly show at GDC. The way I'll explain was explained during the Unity GDC 2016 and, even today, is still the way to do it in Unity the most optimized way.

Before explaining how to do the Unity way, I got to explain a bit about how the Unity GC works because, even today, it's still unclear to many. The GC is like a block system building up from the start and only emptying when the app or software is closed. (On PC/Mac, there's a slight difference than on mobile, but applying it on PC/Mac truly makes a difference never the less.) Each time you use ANY kind of function that generate any kind of parameters, it creates a new block in the GC. A block can be overwritten as long as the new data is smaller than the previous data, but it CAN'T be removed as long as the app/software is running. In other words, this system requires you to avoid nesting too many data, but also requires you to nest data as much as possible.

This might sound like a contradiction, but it's not. It simply means that you got to know what you're nesting so that you can nest as little, at once, as necessary. Nesting is the key to avoid filling the GC.

The most simple solution to the problem asked here is to, at the start of the APP, generate an universal nesting script (which you keep around with DontDestroyOnLoad(); ). I usually do it during the initial splash screen. This is why I don't use Unity's pre-made logo splash screen, but instead build my own in its own scene so that I can initiate all the sweaks and pre-requires static attributes I'll need in the whole app. I usually fill those static attributes with an initial fake chunk of data so that their block are big enough to hold anything I throw in there. For example, if you need an array, keep an array of 512 integer or floats or strings and fill them with 1 fake example big enough (especially string) to hold your actual data.

In this "universal" nesting script, you add the parameters that should hold the raw GPS data (string) and its divided parts (be it arrays of strings or converted data like floats and whatever). Whenever you're reading the GPS data (raw string), you always store it inside the universal nesting script and overwrite the previous one. (If you want to keep the previous ones, I suggest you only keep the converted data and not the raw GPS data. Why redoing the conversion anyway, right?)

Ideally, you hold all the conversion calls and data in the universal nesting script. You just have to remember to work linearly (meaning avoiding having multiple scripts changing the nested values during a single frame) by, usually, having a mastermind function that handle all requests (which stop/ignore duplicates requests).

Why doing this? This way, you fill the GC with the bare minimum AND reuse its same memory blocks again and again. Those memory block don't need to be cleared by the GC as they are kept being used. There's almost no waste of blocks and the sizes of the blocks are exactly the size you need them to be with no randomness (meaning no need to create new bigger blocks for bigger data).

Here's the link toward the Unity showcase of optimization during the Europe GDC 2016 (with the time stamp for watching exactly the explanation about memory management and GC) : https://youtu.be/j4YAY36xjwE?t=1432

If you wonder, yes I, myself, keep a even a bunch of integers in the universal nesting script which are always used when I even just do for() calls to replace foreach() (as foreach() generate a bit block that can't be re-used and is ALWAYS thrown to the GC after each use.)

user3345048
  • 361
  • 3
  • 3