How to handle arrays of extremely large strings in C#

Question

I'm implementing the Floyd-Warshal algorithm, as can be found here.
I'm not just interested in the shortest distance between nodes, but also in the path, corresponding with that distance.

In order to do this, I have modified the algorithm as follows:

double[,] dist = new double[V, V];        // existing line
string[,] connections = new string[V, V]; // new line, needed for remembering the path

...
for (i = 0; i < V; i++){
  for (j = 0; j < V; j++){
    dist[i, j] = graph[i, j];
    connections[i, j] = $"({i},{j})";}} // added: initialisation of "connections"

...
if (dist[i, k] + dist[k, j] < dist[i, j])
{
  dist[i, j] = dist[i, k] + dist[k, j];
  connections[i, j] = connections[i, k] + "-" + connections[k, j]; // Added for remembering shortest path
}

I'm running this algorithm with a snake-like list of locations of one million, all of them simply being added one after the other.

As a result, my connections array looks as follows:

    [0, 0]  "(0,0)"
    [0, 1]  "(0,1)"
    [0, 2]  "(0,1)-(1,2)"
    [0, 3]  "(0,1)-(1,2)-(2,3)"
    [0, 4]  "(0,1)-(1,2)-(2,3)-(3,4)"
    [0, 5]  "(0,1)-(1,2)-(2,3)-(3,4)-(4,5)"
    ...
    [0, 787]  "(0,1)-(1,2)-...(786,787)" // length of +7000 characters
    ...

... at the moment of my OutOfMemoryException (what a surprise) :-)

I would like to avoid that OutOfMemoryException, and start thinking of different techniques:

Forcing garbage collection once a variable is not needed anymore (in case this is not yet done)
"Swapping" very large objects between memory and hard disk, in order to get more memory access.

I believe the second option being the most realistic (don't kill me if I'm wrong) :-) Is there a technique in C# which makes that possible?

Oh, if you react like "You're an idiot! There's a far better way to keep the shortest paths in Floyd-Warshal!", don't refrain from telling me how :-)

Edit: taking into account the multiple comments, for which I'm very grateful:

In the meantime, I've replaced my strings with Lists of Lists of Points, and this seems to be working fine:

Instead of:

string[,] l_connections;

I have:

List<List<List<Point>>> l_connections;

The speed has doubled, and when working with huge collections of dictionaries (+1000 entries of ...), I get a System.OutOfMemoryException only at ±800 entries instead of ±650.

That's already a huge improvement, but does anybody know to get even better?

Edit: information about garbage collector and its settings:

There is following GC and GCSettings information:

System.GC.MaxGeneration:[2]
IsServerGC:[False]
LargeObjectHeapCompactionMode:[Default]
LatencyMode:[Interactive]

I have altered the LargeObjectHeapCompactionMode to CompactOnce but this brought the performance back to similar results as when I was working with large strings instead of Lists.

Edit: how to work with List collection:
Hereby the code, when working with List collections:

public void floydWarshall(Dictionary<(int x, int y), double> dictionary, out double[,] dist, out List<List<List<Point>>> connections)
{
    int dictionary_size = (int)Math.Ceiling(Math.Sqrt(dictionary.Count));
    dist = new double[dictionary_size, dictionary_size];
...

    for (k = 0; k < dictionary_size; k++)
    {
        // Pick all vertices as source one by one
        for (i = 0; i < dictionary_size; i++)
        {
            // Pick all vertices as destination for the above picked source
            for (j = 0; j < dictionary_size; j++)
            {
                // If vertex k is on the shortest path from i to j, then update dist[i][j]
                if (dist[i, k] + dist[k, j] < dist[i, j])
                {
                    dist[i, j] = dist[i, k] + dist[k, j];
                    connections[i][j] =  new List<Point>();
                    connections[i][j].AddRange(connections[i][k]);
                    connections[i][j].AddRange(connections[k][j]);
                 }
            }
        }
    }

Thanks in advance

I don't know Floyd-Warshal but looks fun. Does it have to be strings? — MPelletier, Jan 31 '23 at 15:45
All the string concatenation is probably killing performance. Maybe I'm missing something, but it looks like the next element is a concatenation of the element before it and it's value (aside from the first element), so why are you storing the whole previous array in the next element? — Ron Beyer, Jan 31 '23 at 15:46
Store the data as pairs of binary numbers or bit arrays instead of strings? — D Stanley, Jan 31 '23 at 15:49
As you can see from the results, the "index" `[0, 787]` contains the string `(0,1)-(1,2)-...` which is the way to get from point 0 to point 787. This is exactly what I want. It does not need to be in a string, but I don't think this is the real burden: even if I find a way to reduce the size, it will still become too large when handling larger maps. So, I'm looking for a way to write some memory information on disk (hence the "**swap**" idea), but I have no idea how. — Dominique, Jan 31 '23 at 15:50
From [wikipedia](https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm#Path_reconstruction), "While one may be inclined to store the actual path from each vertex to each other vertex, this is not necessary, and in fact, is very costly in terms of memory. Instead, the shortest-path tree can be calculated for each node in Θ ( | E | ) time using Θ ( | V | )memory to store each tree which allows us to efficiently reconstruct a path from any two connected vertices" — JonasH, Jan 31 '23 at 15:51
@JonasH: indeed it is memory costly, this is what I'm experiencing now. So I'm looking for a way to get memory information on disk in order to relief the memory a bit. — Dominique, Jan 31 '23 at 15:53
Use `StringBuilder` instead of string during the concat process. Then only do `.ToString()` once. — thewallrus, Jan 31 '23 at 15:56
Somebody has proposed to close my question, mentioning the question not to be clear and/or needing more information. Can you please tell me what you don't understand? — Dominique, Jan 31 '23 at 16:02
@Dominique Why? What graph do you have that you think it is cheaper loading a path from disk than computing it on demand? What are your actual performance/memory goals? If i'm not mistaken your memory would scale with `O(N^3)`, so at 10k nodes you would have issues even with disk space. — JonasH, Jan 31 '23 at 16:03
_"I would like to avoid that OutOfMemoryException, and start thinking of different techniques: Forcing garbage collection once a variable is not needed anymore (in case this is not yet done)"_ - I highly doubt that forcing the GC will fix OOM. — Guru Stron, Jan 31 '23 at 16:05
@GuruStron: well, it just might: I create an input graph, calculate all shortest paths and print those. I do this several times, using different types of graphs which I don't need anymore afterwards. In C++, I would just free the memory of those objects but here I'm working in C# and I can't do that. Therefore I hope that forcing the garbage collector might free some memory which I might need. — Dominique, Jan 31 '23 at 16:10
@Dominique I suspect that CLR will try to free memory before throwing OOM by itself. Also have you tried to calculate if you have enough memory in the first place? Another issue which can be relevant - [LOH](https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap) (objects with size > 85kb will end up there) which can be fragmented which in theory can lead to LOH. — Guru Stron, Jan 31 '23 at 16:14
@Dominique also it is super easy to check - use try-catch, perform full gc and retry the failed iteration. — Guru Stron, Jan 31 '23 at 16:17
@GuruStron: Thanks a lot, I never heard about the "Large Object Heap". It's evening here and I'll call it a day but tomorrow it will be my first thing to investigate. :-) — Dominique, Jan 31 '23 at 16:18
@GuruStron: by "full gc", do you mean `System.GC.Collect();`? — Dominique, Jan 31 '23 at 16:19
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/251523/discussion-between-guru-stron-and-dominique). — Guru Stron, Jan 31 '23 at 16:20
@GuruStron: ok, but I'm now leaving (it's half past five here in my country), will you be back tomorrow? — Dominique, Jan 31 '23 at 16:25
I am not entirely sure about the algo but when it comes to c# I can help out, you could try enabling gcAllowVeryLargeObjects (https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gcallowverylargeobjects-element). And make sure your are building 64bit version of the app so you won't hit 4Gigs for Memory Limit. When it comes to GC.Collect(), c# runtime is smart enough to free up space, 99% of the cases we don't need to call this explicitly. — Lokanath, Jan 31 '23 at 16:30
As a point of comparison, my fairly unoptimized A* does 10k nodes in about 16ms. And 1M nodes in 2s, the later would require on the order of 1 exabytes using your approach, if I'm not mistaken. — JonasH, Jan 31 '23 at 16:32
I think code that uses strings to store list of int pairs (and using string concatenation to build those strings) can't really be seriously considered for "performance optimizations" question. As result the question is unclear to me (I don't believe one would be asking "should I use strings to store int arrays/arrays of int pairs" nor "should I build long strings with a lot of string concatenation"... but than why that code is shown at all?) — Alexei Levenkov, Jan 31 '23 at 17:06
@GuruStron: I've replaced the strings by `List` objects, the performance has increased but trying to work with the `GCSettings` seemed not a good idea, as I described in my last edit (or am I doing something wrong?). Do you have any proposals? — Dominique, Feb 23 '23 at 12:59
@Dominique can you please post full code somewhere? Also i agree with the point that storing paths can be suboptimal. — Guru Stron, Feb 23 '23 at 16:20
@GuruStron: I've edited my question (Edit: how to work with List collection). In the meantime I already had a look at several collection types (`Array`, `ArrayList`, `Queue`, `LinkedList`) and at least those ones don't improve the performance (either memory wise or speed wise). — Dominique, Feb 24 '23 at 08:44
Have you tried increasing your swap file size? Do you even have a large enough disk for 1 million of these "extremely large strings"? Do they need to be strings? Strings are unicode (at least 16bits per char), a Span would cut space requirements in half. — jwdonahue, Feb 26 '23 at 08:20
@jwdonahue: I prefer a pure software solution, as I'll need to implement that solution on multiple customers, hence I prefer not touching the swapfile. In top of that, I already have replaced the large strings into `List` of `List` objects and I've replaced the strings by `Point` objects (the strings represent numbers), reducing the memory already, but I wonder if more is possible. — Dominique, Feb 27 '23 at 07:28
`struct Connection { short node1; short node2;}` then keep path as `ArrayList` will save considerable memory - 4 bytes per connection, instead of 12-36 bytes. (Characters are 2 bytes in C#), but use int instead of short if you need more than 30,000 or so. Also concatenation to long strings is very slow as the string must be copied. — Ben, Feb 27 '23 at 13:26
@jessehouwing: That doesn't really help, because (in a real non-snake graph) there can be many different paths with the same prefix and different next vertex... — Ben Voigt, Feb 28 '23 at 17:22

Ben Voigt · Accepted Answer · 2023-02-28T17:35:35.007

Oh, if you react like "You're an idiot! There's a far better way to keep the shortest paths in Floyd-Warshal!", don't refrain from telling me how :-)

There is indeed.

Every shortest path from m to n is either a direct step m-n or a path through k, m-...-k-...-n, where m-...-k is the shortest path from m to k, already stored at array index [m,k] and k-...-n is the shortest path from k to n, already stored at array index [k,n].

So all you need to store at [m,n] is the value of k, which is any single interior vertex along the shortest path. (The k variable in the question code is a valid choice)

Storage requirement: O(V^2), down from O(V^3).

A suitable C# data structure would be int[,], where

path[m,n] == m means a direct link
path[m,n] == n means "I don't know yet" (storing null in int?[,] is also a reasonable approach, but wastes some memory)
path[m,n] == k && k != m && k != n means the shortest path passes through k and the path can be reconstructed by recursing both [m,k] and [k,n]

List<int> GetPathList(int n, int m, List<int> buffer = null)
{
    if (buffer == null) {
        buffer = new List<int>();
        buffer.Add(n);
    }
    if (n == m) return buffer;
    int k = path[n,m];
    if (k == n) {
        buffer.Add(m);
        return buffer;
    }
    if (k == m) return null;
    if (null == GetBuffer(n,k,buffer)) return null;
    return GetBuffer(k,m,buffer);
}
string GetPathString(m,n) => string.Join('-', GetPathList(n,m));

While indeed it looks like a right approach, isn't the whole point of the code shown by OP is to generate all the full paths from that table? From what I understand they already built that table and for some unknown reason need to create paths as full lists of items... — Alexei Levenkov, Feb 28 '23 at 17:15
@AlexeiLevenkov: OP's code has two tables: `dist` and `connections`. Neither is this. He did search for `k` in his algorithm, but didn't store a table of the `k` values. — Ben Voigt, Feb 28 '23 at 17:17

eocron · Answer 2 · 2023-02-27T20:59:03.633

For shortest distance and paths you can use modified Dijkstra algoritm. Works fast, can be offloaded, pretty much any dynamic algorithm solution in general. It also support infinite graphs (which evaluate at runtime) and can work like A* search algorithm (shortest path in visited verticies, they work like this in games like minecraft).

I had implemented it some time ago and it looks like this visually, you can play around in my unit test section, it will visualize your graph in Graphviz format URL for better debugging experience.

My implementation pros:

can specify weight/verticies types
can specify weight per connection (delegate)
can specify weight per vertex (delegate)
can specify connections from vertex to other vertices (delegate)
can specify infinite graphs with stop conditions (not relevant to you)
Complexity: O(E + V*log(V))
Memory: O(V)
can get path from A to B
can get weight from A to B
can get full shortest path tree (your case)

Works like this:

    [Test]
    [TestCase(new[] {1, 1, 1, 1}, 3)]
    [TestCase(new[] {2, 1, 1, 1}, 2)]
    [TestCase(new[] {1, 2, 3, 4}, 2)]
    [TestCase(new[] {4, 3, 2, 1}, 1)]
    [TestCase(new[] {2, 2, 1, 3, 3, 2, 1}, 3)]
    [TestCase(new[] {2, 0, 1, 1}, 1)]
    public void PathToNearCity(int[] cities, int expectedMinSteps)
    {
        var graph = ParsePathToRome(cities);
        var source = 0;
        var targets = cities
            .Select((x, i) => new {x, i})
            .Where(x => x.x == 0)
            .Select(x => x.i)
            .Concat(new[] {cities.Length - 1})
            .ToList();
        var result = new InfiniteDijkstraAlgorithm<int, int>(
            x => graph.OutEdges(x).Select(y => y.Target),
            x => 0,
            (x, y) => x.Weight + 1);
        result.Search(source);
        var target = targets.OrderBy(x => result.GetWeight(x)).First();
        var pathToRome = result.GetPath(source, target).ToList();
        Print(graph, pathToRome);
        Assert.AreEqual(expectedMinSteps, result.GetWeight(target));
    }

I also could calculate Levenstain distance/paths:

    [Test]
    [TestCase("kitten", "sitting", 3)]
    [TestCase("kitten", "kitting", 2)]
    [TestCase("hello", "kelm", 3)]
    [TestCase("asetbaeaefasdfsa", "asdfaew", 12)]
    [TestCase("aaaa", "a", 3)]
    [TestCase("a", "a", 0)]
    [TestCase("a", "b", 1)]
    public void LevenstainDistance(string sourceStr, string targetStr, int expectedMinDistance)
    {
        var source = Tuple.Create(0, 0);
        var target = Tuple.Create(sourceStr.Length, targetStr.Length);
        var result = new InfiniteDijkstraAlgorithm<Tuple<int, int>, int>(
            x =>
            {
                var list = new List<Tuple<int, int>>();
                if (x.Item2 < targetStr.Length)
                    list.Add(Tuple.Create(x.Item1, x.Item2 + 1));
                if (x.Item1 < sourceStr.Length)
                    list.Add(Tuple.Create(x.Item1 + 1, x.Item2));
                if (x.Item1 < sourceStr.Length && x.Item2 < targetStr.Length)
                    list.Add(Tuple.Create(x.Item1 + 1, x.Item2 + 1));
                return list;
            },
            x => 0,
            (xw, y) =>
            {
                var x = xw.Vertex;
                if (x.Item1 < sourceStr.Length
                    && x.Item2 < targetStr.Length
                    && sourceStr[x.Item1] == targetStr[x.Item2]
                    && (y.Item1 - 1) == x.Item1
                    && (y.Item2 - 1) == x.Item2)
                {
                    return xw.Weight;
                }

                return xw.Weight + 1;
            },
            buildShortestPathTree: true);
        result.Search(source);

        var minDistance = result.GetWeight(target);
        Assert.AreEqual(expectedMinDistance, minDistance);
    }

It is very surprising how you managed to do "Memory: O(V)" when OP is looking for storing all the shortest paths which *should* be in order of O(V^3) for memory usage. Could you please clarify where in your code all paths are stored? — Alexei Levenkov, Feb 27 '23 at 21:32
To put it simple, O(V) means that you make a shortest path, which is really just all verticies in worst case scenario. For *entire tree* of paths it will be O(V*log(V)) because you are essentially create a tree out of your graph removing heavy connections until only one exist and connects to root. Root is your starting point, leaf - target points. What is stored inside vertex is to your liking, it can be string or some GUID in database. So it is essentially becomes a prefix tree or patricia/trie rather than giant list. — eocron, Feb 27 '23 at 21:59
I guess we just disagree on what is written in the question. I'll wait and see what OP has to say. — Alexei Levenkov, Feb 27 '23 at 22:21
Oh, I get it now. Yeah, totally against his approach of storing everything in serialised form IN memory — eocron, Feb 28 '23 at 04:55
@eocron: I'm sorry, but Alexei right indeed: it's not about finding the shortest path from one single node, but about keeping in memory all shortest paths and dealing with the corresponding memory issues. Nevertheless, your answer gives very interesting insight on Dijkstra algorithm, hence +1. — Dominique, Feb 28 '23 at 07:26
@Dominique What is the problem in running it for each node and getting O(V^2) on all trees in memory instead of O(E^3) which is considerably larger? — eocron, Feb 28 '23 at 17:15
@eocron: Because the actual search time is much worse for searching separately on each node vs solving the entire graph at once. — Ben Voigt, Mar 02 '23 at 14:54

Joel Coehoorn · Answer 3 · 2023-02-27T21:36:38.423

I doubt you are actually using all of the memory on the system. I suspect the actual problem is related to exhaustion of the virtual address table for the process. This is a long-standing issue with the .Net GC (here's an example from 2011), where actual memory is reclaimed, but the address table is often never compacted for items above a certain size, and therefore can cause OutOfMemory exceptions even when memory use is relatively sparse.

I've heard this is actually significantly improved in for recent versions of .Net Core, but I haven't done a deep dive on it yet myself and I still suspect this is our problem here. If so, the issue can be solved by changing the GCSettings.LargeObjectHeapCollectionMode property.

I base the suspicion on this info in this question:

LargeObjectHeapCompactionMode:[Default]

When we check the documentation on what that means we find this:

Default 1 The large object heap (LOH) is not compacted.

Note: the only other option looks like this:

CompactOnce 2
The large object heap (LOH) will be compacted during the next blocking generation 2 garbage collection.

See the "once" in that description? Also see the "next blocking generation 2"? Those don't happen often. This means you'll want to be explicit about forcing a full (blocking) Gen 2 collection after changing the property. You'll also want to be strategic about how you use this, as full Gen 2 collections are not speedy operations.

It's 2023 and chances that OP uses 32bit process are... low. I don't think it is realistic to exhaust 64 bit memory space allocation in reasonable time even if no compaction happens hor LOH objects... (And it is very easy to use up all memory with code that *should* be O(node_count^3) for memory usage) — Alexei Levenkov, Feb 27 '23 at 21:36
@AlexeiLevenkov: are you saying that this approach is only valid for 32-bit processes? As you state correctly, I'm indeed working on 64-bit application. — Dominique, Feb 28 '23 at 07:28
@Dominique what I'm trying to point out that you completely ignore all the comments that say that you generate multiple GBs of those paths and you have to figure out why you actually trying to do that... (lack of compaction of LOH may be problem at some point - I don't know when it impacts 64bit processes, but that is very unlikely *the problem* your code is facing). — Alexei Levenkov, Feb 28 '23 at 08:26

Soraphis · Answer 4 · 2023-02-28T17:53:58.570

One idea which allows you to reduce the memory footprint - but is kinda inconvenient:

don't store strings, every character is 2 byte, but you're only using characters from 0-9 + 4 separation characters "(", ")", "-", ",".

so 13 characters total so you only need less than 4 bit to encode a character.

write your own string class that has a byte[] and encodes 2 characters in each byte. (in this 4 bit is enough space for a termination character "\0").

so you'd have a class like:

    public struct SmallString{
    
        private static char[] Charset = new []{'\0', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '(', ')', '-', ','};
    
        private byte[] data;
    
        public static SmallString FromString(string input){
    
            var data = new byte[(input.Length+1) / 2];
    
            int pos = 0;
            byte current = 0;
            bool second = false;
            foreach(var ch in input){
    
                var idx = Array.IndexOf(Charset, ch);
                if(idx == -1) throw new Exception("Illegal Character");
                current |= (byte)(idx << (second ? 0 : 4));
    
                if(second){
                    data[pos] = current;
                    current = 0;
                    second = false;
                    pos++;
                }else{
                    second = true;
                }
            }
            if(second){
                data[pos] = current;   
            }
            
            return new SmallString(){data = data};
        }
    
        public override string ToString(){
            // determine length:
            var length = data.Length * 2;
            // if the last half byte is empty/0 reduce length by 1
            //if((data[^1] & 0b1111) == 0) length--;
    
            return string.Create(length, data, (span, state) =>
            {
                int i = 0;
                foreach (byte b in state)
                {
                    span[i++] = Charset[(b >> 4) & 0x0F];
                    span[i++] = Charset[b & 0x0F];
                }
            });
        }
    
    }

this would be a reduction from 4 bytes to 1 byte (4x).

// note that ")" is always followed by a "-", so that could be reduced, too, but makes the implementation more complicated. and for the 10 digits you also need at least 4 byte so you would not save anything.

// you could store the number pairs using LEB128 for the numbers, that could be even smaller, than this approach.

Since all your strings are also quite similar, it would also be possible to create a lookup table, (e.g. your minimum string size is 5, which would be 10 byte, which kinda reappears in all partial pathes of a given path)

so you could create a list for the first let's say 65535 strings that are 10 characters or longer and store them in that static list.

your custom string implementation would then be an array of 'shorts' (containing the indices to the prefix-strings) + an actual short string for the suffix.

so instead of storing 10 characters (20 bytes) it would store 2 bytes (a short). (a reduction of 10x)

edit:

to follow up on my notes above, using LEB128 to store the numbers, and storing them as tuples, using the following LEB128 struct yields an even higher gain, compared to the smallString struct.

storing "(0,1)-(1,2)-(2,3)-(3,4)-(4,5)"...

...as string: 58 bytes
...as SmallString: 15 bytes
...as LEB128 tuple: 10 bytes
(note that ACTUAL object size may vary a few bytes)

for the LEB128 usage:

// store as:
List<(LEB128, LEB128)> bs = new (){ (0,1),(1,2),(2,3),(3,4),(4,5) };

// when printout is necessary:
Console.WriteLine(string.Join('-', bs).Replace(" ", ""));

public struct LEB128
{
    private byte[] data;
    public int Length => data.Length;
    
    private static List<byte> bytes = new(); // probably should be threadlocal
    public static LEB128 Encode(int number)
    {
        const int bitsPerByte = 7;
        const int continuationBit = 0x80;
        const int valueMask = continuationBit - 1;

        bytes.Clear();
        do
        {
            byte byteValue = (byte)(number & valueMask);
            number >>= bitsPerByte;
            if (number != 0)
            {
                byteValue |= (byte)continuationBit;
            }
            bytes.Add(byteValue);
        } while (number != 0);

        return new (){data = bytes.ToArray()};
    }
    
    public static implicit operator LEB128(int n) => LEB128.Encode(n);
    
    public int Decode() => LEB128.Decode(data);
    
    public static int Decode(byte[] bytes)
    {
        const int bitsPerByte = 7;
        const int valueMask = (1 << bitsPerByte) - 1;

        int result = 0;
        int shift = 0;
        int index = 0;

        while (index < bytes.Length)
        {
            int byteValue = bytes[index] & valueMask;
            result |= byteValue << shift;
            shift += bitsPerByte;

            if ((bytes[index] & 0x80) == 0)
            {
                break;
            }

            index++;
        }

        if (index == bytes.Length && (bytes[index - 1] & 0x80) != 0)
        {
            throw new ArgumentException("Invalid LEB128 encoding: the last byte has the continuation bit set.");
        }

        return result;
   }

    public override string ToString()
    {
        return this.Decode().ToString();
    }
}

score 0 · Answer 5 · answered Mar 01 '23 at 09:40

I had more o less the same problem when I wanted to calculated all paths between the nodes of a graph. The problem was that some of our graphs was a mesh graph and finding all path between nodes in mesh graph is theoretically a NP-complete problem. So no matter what algorithm you use it will be hugely time consuming and memory intensive. Luckily in my case we were just interested in some specific path between some specific nodes. So I added some logic to ignore some paths and just find the ones that we interested. It had huge impacts on the performance. So if you think about your problem again maybe you can ignore some nodes and paths and reduce the algorithm complexity.

How to handle arrays of extremely large strings in C#

5 Answers5