0

I am currently writing some code in C# and have decided, at one point, that for my specific case a composite pattern will be useful.

However, there is clearly a performance problem in my code once I start making a composite of multiple components.

I am struggling a bit to explain myself, so here is some simple code that will help.

class Interface IComponent
{
    public double operation(double[] vector);
}

class Sum: IComponent
{
    public double operation(double[] vector) 
    {
        double sum = 0;
        foreach(double dub in vector) sum += dub;
        return sum;
    }
}

class SquaredSum: IComponent
{
    public double operation(double[] vector) 
    {
        double sum = 0;
        foreach(double dub in vector) sum += dub * dub;
        return sum;
    }
}

class Composite: IComponent
{
    IComponent[] components;
    public Composite(params IComponent[] components) this.components = components;
    
    public double operation(double[] vector)
    {
         double sum = 0;
         foreach(var component in components) sum += component.operation(vector);
         return sum;
    }
}

Now, if I make a new composite object of Sum and SquaredSum, the code will have to loop through the double array (vector) twice, once for the Sum instance and once for the SquaredSum instance. This is a serious problem once I'm looking at a million rows of data.

Ideally, I'd like my code to compile/act roughly like as if both the sum and squared sum are being calculated in the same loop. So I only have to loop through the vector once.

My Solution

The solution I can think of is to change IComponent to an abstract class as below.

abstract class Component
{
    public abstract double TransformData(dub);

    public double operation(double[] vector)
    {
        sum = 0;
        foreach(double dub in vector) sum += TransformData(dub);
        return sum;
    }
}

class Composite: Component
{
    Component[] components;
    public Composite(params Component[] components) this.components = components;
    
    public double operation(double[] vector)
    {
        double sum = 0;
        foreach(double dub in vector)
        {
            foreach(var component in components) sum += component.TransformData(dub);
        }
        return sum;
    }
}

This will allow the code to only iterate through the input array once.

My concern with my solution is that implementations of each Component will be reasonably abstract. Also, each component is limited to only making operations on the current double/object is passed into it (perhaps I want an operation on the next and previous element).

For example, implementing a component that calculates the mean would be impossible with my solution, which is not what I want.

Question

My question is, is there another magic OOP bullet to solve this problem efficiently?

I'm partially aware that LinQ may offer a nice solution but I don't know how.

Thank you.

1 Answers1

0

Your solution does not make much sense to me. The main problem is that your IComponent objects cannot easily be combined. Take some real world problem, like computing the standard-deviation, how would I do that using only IComponents?

I grant that it will absolutely be possible to construct such a solution, but it will likely be quite complicated.

Also, your performance concerns seem misplaced. Computers are fast, for the vast majority of code, performance is simply not important. The main case where performance is important is when processing very large amounts of data, like images, video etc, where you have 10^6+ items to process. For these cases the typical way is to program on a fairly low level, i.e. plain loops doing simple arithmetic operations on simple data structures, like arrays. And avoiding branches and non-inlineable method calls (i.e. any kind of polymorhpism) as much as possible. The more optimized some code is, the less abstract and readable it typically becomes. Highly optimized code tend to be be just about unreadable, or composed of much more comments than code.

So, worry about readability, not efficiency, until you have actually identified a real performance problem.

For the task above it seems much simpler and readable to just do

vector.Sum(v => v * v + v);
JonasH
  • 28,608
  • 2
  • 10
  • 23
  • Hey Jonas, thanks for your response. I agree, things like the mean/stdev are hard in my solution. The code I provided was a simple version of what I'm trying to achieve. I'm calculating loss functions of double arrays. For instance a loss ratio (2 arrays) or a sse of another. Then I'm creating a new loss-function, which might be LossRatio(double[][])+sse(double[][]). If I coded a loop, I'd do it in one pass of the data. What I want is to be able to use a pattern without having to re-loop the data. I don't want to write a new C function for each combination of loss ratio I may use/make. – Giacomo Nassif Jack Nov 19 '21 at 09:02