2

I'm developing a graph where I need to keep the memory usage per node as low as possible. Each node implements IEnumerator / IEnumerable.

IEnumerator / IEnumerable make use of "position" in the canonical examples, which is a persistent cursor value required for iteration (e.g. as used by foreach).

What I need to avoid is storing this "position" value internally to the node itself, as this adds overhead where every byte counts.

How can I construct the node class such that a temporary object stores this value -- preferably on the stack -- only while an iteration is taking place, and not as part of the node itself? Is this possible?

Engineer
  • 8,529
  • 7
  • 65
  • 105
  • It would be helpful if you'd post the general structure of your code. I've answered as well as I can without much context, but I could help much more if you'd give more details of what you're doing. – Jon Skeet May 11 '13 at 08:35

2 Answers2

5

Typically an IEnumerable<T> doesn't store a position - only an IEnumerator<T> does. (Iterator block implementations are odd in terms of implementing both, but they're definitely an anomaly.)

I suggest that you take the same approach that List<T> does: implement IEnumerable<T> with explicit interface implementation, but also have a public method returning a custom mutable struct (horrible I know, but it does solve your problem) which contains a reference to your node and the position within it. When you iterate using foreach, that struct value will only be stored on the stack (normally - assuming you're not doing this within an iterator block, for example).

The normal implementation would be to create a mutable reference type just for the IEnumerator<T>. Normally this is okay as even if you have a lot of IEnumerable<T> values live at the same time, you have few IEnumerator<T> values. Are you concerned about the concurrent number of live objects, or garbage collection?

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • GC is not an issue, as every node needs to remain persistent in system memory through the application lifetime. The concurrent number of live objects is the issue. Early estimates suggest the overall data structure will take up around a gigabyte, and I'd like to keep it less than that if possible. – Engineer May 11 '13 at 08:41
  • @NickWiggill: But do you have a lot of current `IEnumerator` values? That's the point - you wouldn't have the position within the `IEnumerable`, because it's not logically part of that state. – Jon Skeet May 11 '13 at 08:59
  • +1 Jon, you are quite right of course, I do not have many _current_ iterators. So I could just as well be using a reference type for this. I'll leave the accepted answer on Marc's because I like knowing the stack-based way to do this if I ever need to, but would mark both as accepted if I could. Thanks. – Engineer May 11 '13 at 10:28
4

If "node" is the underlying data, then it is very incorrect to store the position in the node, as you are meant to be able to have separate enumerators. It is unclear how you are currently implementing this API, but if you use an " iterator block " with the position as a local variable it will be done correctly, but on the heap. You can also implement an iterator manually on the stack by creating a struct iterator. It is important that the public GetEnumerator() is returned as the struct type, so you will need to use explicit interface implementation for IEnumerable etc. Note that foreach over Node directly will use the stack, but IEnumerable etc will use the heap.

For example (using a basic linked-list):

using System;
using System.Collections;
using System.Collections.Generic;
class Program
{
    static void Main()
    {
        var list = new MyList<int>();
        list.Add(1);
        list.Add(2);
        list.Add(3);
        foreach (var i in list)
        { // this IS NOT using IEnumerable/IEnumerable<T>
            Console.WriteLine(i);
        }
    }
}
public class MyList<T> : IEnumerable<T>
{
    internal sealed class Node
    {
        private readonly T value;
        public Node Next { get; set; }
        public T Value { get { return value; } }
        public Node(T value) { this.value = value; }
    }
    Node root;
    public void Add(T value)
    {
        var newNode = new Node(value);
        var node = root;
        if (node == null) root = newNode;
        else
        {
            while (node.Next != null) node = node.Next;
            node.Next = newNode;
        }
    }
    public Enumerator GetEnumerator() { return new Enumerator(this); }
    IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
    IEnumerator<T> IEnumerable<T>.GetEnumerator() { return GetEnumerator(); }
    public struct Enumerator : IEnumerator, IEnumerator<T>
    {
        void IDisposable.Dispose() { node = null; list = null; }
        void IEnumerator.Reset() { node = null; }
        object IEnumerator.Current { get { return Current; } }
        private MyList<T> list;
        private Node node;
        public bool MoveNext()
        {
            if (node == null)
            {
                node = list.root;
                return node != null;
            }
            else
            {
                if (node.Next == null) return false;
                node = node.Next;
                return node != null;
            }
        }
        internal Enumerator(MyList<T> list) { this.list = list; node = null; }
        public T Current { get { return node == null ? default(T) : node.Value; } }
    }
}
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Marc, I like this answer, but can you explain or point me to an example of utilising such a struct-based iterator? – Engineer May 11 '13 at 08:43
  • 2
    @NickWiggill sure - see edit. The important point here is that if I `foreach` over the list *itself* then we are **not** using the `IEnumerable[]` API - it is using the typed iterator directly. You can also see this for inbuilt types like `List.Enumerator`. However, if you type it as `IEnumerable`, then it uses the boxed implementation. – Marc Gravell May 11 '13 at 08:59