0

This method gets:

IEnumerable<object[]> - in which every array is in fixed size (it represent relational data structure).

DataEnumerable.Column[] - some metadata columns,mostly they will have the same value for all rows.

Expected outcome:

each "row" should get value for each of these columns (so the data structure remains relational).

    private IEnumerable<object[]> BindExtraColumns(IEnumerable<object[]> baseData, int dataSize, DataEnumerable.Column[] columnsToAdd)
    {
        int extraColumnsLength = columnsToAdd.Length;
        object[] row = new object[dataSize + extraColumnsLength];

        string columnName;
        int rowNumberColumnIndex = -1;

        for (int i = 0; i < extraColumnsLength; i++)
        {
            //Assign values that doesn't change between lines..
            // Assign rowNumberColumnIndex if row number column exists
        }

        //Assign values that change here, since we currently support only row number
        // i'ts not generic enough        
        if (rowNumberColumnIndex != -1)
        {
            int rowNumber = 1;

            foreach (var baseRow in baseData)
            {
                row[rowNumberColumnIndex] = rowNumber;

                Array.Copy(baseRow, 0, row, extraColumnsLength, dataSize);

                yield return row;

                rowNumber++;
            }
        }
        else
        {
            foreach (var baseRow in baseData)
            {
                Array.Copy(baseRow, 0, row, extraColumnsLength, dataSize);

                yield return row;
            }
        }
    }

this method can be called from hundreds of threads with relatively big data sets so performance here is critical, and i tried to create as minimum new objects as possible.

Please note - this is a private method, which used ONLY BY DataReader, which read each line, and passes it to another array immediately prior to reading the next line.

So - does copying arrays here be optimized here somehow, and should i use (carefully) memory to boost things here?

Thanks

Yosi Dahari
  • 6,794
  • 5
  • 24
  • 44

1 Answers1

5

Your code is fundamentally broken. You're just returning a reference to the same array every time, which means that unless the caller uses the data within each item immediately, it effectively gets lost. For example, suppose I use:

List<object[]> rows = BindExtraColumns(data, size, toAdd).ToList();

Then when I iterate over the rows, I find the same data in every row. That's really not a good experience.

I think it would make much more sense to create a new array for each iteration. Yes, that's a lot of extra memory being used - but it doesn't surprise callers nearly as much.

If you really don't want to do that, I suggest you change the approach so that the caller has to pass in an Action<object[]> to be executed on each row, with the documented proviso that if the caller stashes a reference to the array, they may well be surprised by the results.

You're obviously very concerned about performance, but if your data is coming from a database I'd expect the array creation/copying performance to be insignificant. You should write the simplest (and most reliable) code that works first, and then benchmark it to see whether it performs well enough. Unless you've got evidence that you need to make this surprising design choice, it feels like you're optimizing way too early.

EDIT: Now we know that it's a private method only used in one specific place, I would still avoid this reuse. It's simply fragile. I really would change to passing in an Action<object[]> or simply copying the data to a new array every time. I certainly wouldn't keep the current approach without strong evidence that it's a bottleneck: as I said before, I'd expect the database communication to be much more important. Leaving timebombs in your code like this very rarely works out well.

If you really, really want to keep doing this, you should document it very strongly, giving severe warnings that the result is non-idiomatic.

In terms of whether there's more optimization you could do - well... one alternative would be to avoid having to work with a single array in the first place. You could create a class which held references to both arrays (the current base row and the fixed data) and exposed an indexer which returned the value from one array or the other based on which index was being requested. We don't know what you're doing with the data ("passes it to another array" doesn't really mean anything) so we don't know whether that's feasible, but it would be efficient and could be implemented without the odd behaviour.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I had that concern, and i wrote it like you said to begin with. But how can i know/handle the memory limitation? And BTW - it currently works fine! – Yosi Dahari Sep 04 '13 at 08:20
  • @Yosi: Presumably it currently works fine because of how you *happen* to use it. That doesn't mean it's not overly fragile. As for "the memory limitation" - *what* memory limitation? You haven't explained exactly what that limitation is, or how you've diagnosed it. – Jon Skeet Sep 04 '13 at 08:56
  • first, thanks for your help! I think i got your point. It seems frightening to me to create so many arrays for each invocation, but i guess it might not be a problem, and for sure regardless of that this code looks bad after what you described.. – Yosi Dahari Sep 04 '13 at 09:02
  • @Yosi: Any reason for unaccepting this? Does it not work somehow? If so, please give more details. – Jon Skeet Sep 08 '13 at 06:56
  • @Yosi: See my edit. It's still not at all clear that you've actually got any evidence that the performance is critical here - you certainly haven't *posted* any evidence to that effect (backed up by measuring the impact of copying the array data vs the impact of reading the data to start with). – Jon Skeet Sep 08 '13 at 07:09
  • Side note - this wan't my first option for implementing what I needed. A more favorite solution by me was to [add columns to datareader](http://stackoverflow.com/questions/18504061/how-to-add-columns-to-datareader) (the `IEnumerable` in this question was created by this DataReader) – Yosi Dahari Nov 30 '13 at 10:56